The search functionality is under construction.

Keyword Search Result

[Keyword] BERT(66hit)

1-20hit(66hit)

  • Automated Labeling of Entities in CVE Vulnerability Descriptions with Natural Language Processing Open Access

    Kensuke SUMOTO  Kenta KANAKOGI  Hironori WASHIZAKI  Naohiko TSUDA  Nobukazu YOSHIOKA  Yoshiaki FUKAZAWA  Hideyuki KANUKA  

     
    PAPER

      Pubricized:
    2024/02/09
      Vol:
    E107-D No:5
      Page(s):
    674-682

    Security-related issues have become more significant due to the proliferation of IT. Collating security-related information in a database improves security. For example, Common Vulnerabilities and Exposures (CVE) is a security knowledge repository containing descriptions of vulnerabilities about software or source code. Although the descriptions include various entities, there is not a uniform entity structure, making security analysis difficult using individual entities. Developing a consistent entity structure will enhance the security field. Herein we propose a method to automatically label select entities from CVE descriptions by applying the Named Entity Recognition (NER) technique. We manually labeled 3287 CVE descriptions and conducted experiments using a machine learning model called BERT to compare the proposed method to labeling with regular expressions. Machine learning using the proposed method significantly improves the labeling accuracy. It has an f1 score of about 0.93, precision of about 0.91, and recall of about 0.95, demonstrating that our method has potential to automatically label select entities from CVE descriptions.

  • PSDSpell: Pre-Training with Self-Distillation Learning for Chinese Spelling Correction Open Access

    Li HE  Xiaowu ZHANG  Jianyong DUAN  Hao WANG  Xin LI  Liang ZHAO  

     
    PAPER

      Pubricized:
    2023/10/25
      Vol:
    E107-D No:4
      Page(s):
    495-504

    Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction. However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.

  • Hilbert Series for Systems of UOV Polynomials

    Yasuhiko IKEMATSU  Tsunekazu SAITO  

     
    PAPER

      Pubricized:
    2023/09/11
      Vol:
    E107-A No:3
      Page(s):
    275-282

    Multivariate public key cryptosystems (MPKC) are constructed based on the problem of solving multivariate quadratic equations (MQ problem). Among various multivariate schemes, UOV is an important signature scheme since it is underlying some signature schemes such as MAYO, QR-UOV, and Rainbow which was a finalist of NIST PQC standardization project. To analyze the security of a multivariate scheme, it is necessary to analyze the first fall degree or solving degree for the system of polynomial equations used in specific attacks. It is known that the first fall degree or solving degree often relates to the Hilbert series of the ideal generated by the system. In this paper, we study the Hilbert series of the UOV scheme, and more specifically, we study the Hilbert series of ideals generated by quadratic polynomials used in the central map of UOV. In particular, we derive a prediction formula of the Hilbert series by using some experimental results. Moreover, we apply it to the analysis of the reconciliation attack for MAYO.

  • Effective Language Representations for Danmaku Comment Classification in Nicovideo

    Hiroyoshi NAGAO  Koshiro TAMURA  Marie KATSURAI  

     
    PAPER

      Pubricized:
    2023/01/16
      Vol:
    E106-D No:5
      Page(s):
    838-846

    Danmaku commenting has become popular for co-viewing on video-sharing platforms, such as Nicovideo. However, many irrelevant comments usually contaminate the quality of the information provided by videos. Such an information pollutant problem can be solved by a comment classifier trained with an abstention option, which detects comments whose video categories are unclear. To improve the performance of this classification task, this paper presents Nicovideo-specific language representations. Specifically, we used sentences from Nicopedia, a Japanese online encyclopedia of entities that possibly appear in Nicovideo contents, to pre-train a bidirectional encoder representations from Transformers (BERT) model. The resulting model named Nicopedia BERT is then fine-tuned such that it could determine whether a given comment falls into any of predefined categories. The experiments conducted on Nicovideo comment data demonstrated the effectiveness of Nicopedia BERT compared with existing BERT models pre-trained using Wikipedia or tweets. We also evaluated the performance of each model in an additional sentiment classification task, and the obtained results implied the applicability of Nicopedia BERT as a feature extractor of other social media text.

  • Auxiliary Loss for BERT-Based Paragraph Segmentation

    Binggang ZHUO  Masaki MURATA  Qing MA  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/10/20
      Vol:
    E106-D No:1
      Page(s):
    58-67

    Paragraph segmentation is a text segmentation task. Iikura et al. achieved excellent results on paragraph segmentation by introducing focal loss to Bidirectional Encoder Representations from Transformers. In this study, we investigated paragraph segmentation on Daily News and Novel datasets. Based on the approach proposed by Iikura et al., we used auxiliary loss to train the model to improve paragraph segmentation performance. Consequently, the average F1-score obtained by the approach of Iikura et al. was 0.6704 on the Daily News dataset, whereas that of our approach was 0.6801. Our approach thus improved the performance by approximately 1%. The performance improvement was also confirmed on the Novel dataset. Furthermore, the results of two-tailed paired t-tests indicated that there was a statistical significance between the performance of the two approaches.

  • Convex and Differentiable Formulation for Inverse Problems in Hilbert Spaces with Nonlinear Clipping Effects Open Access

    Natsuki UENO  Shoichi KOYAMA  Hiroshi SARUWATARI  

     
    PAPER-Nonlinear Problems

      Pubricized:
    2021/02/25
      Vol:
    E104-A No:9
      Page(s):
    1293-1303

    We propose a useful formulation for ill-posed inverse problems in Hilbert spaces with nonlinear clipping effects. Ill-posed inverse problems are often formulated as optimization problems, and nonlinear clipping effects may cause nonconvexity or nondifferentiability of the objective functions in the case of commonly used regularized least squares. To overcome these difficulties, we present a tractable formulation in which the objective function is convex and differentiable with respect to optimization variables, on the basis of the Bregman divergence associated with the primitive function of the clipping function. By using this formulation in combination with the representer theorem, we need only to deal with a finite-dimensional, convex, and differentiable optimization problem, which can be solved by well-established algorithms. We also show two practical examples of inverse problems where our theory can be applied, estimation of band-limited signals and time-harmonic acoustic fields, and evaluate the validity of our theory by numerical simulations.

  • Real-Time Detection of Global Cyberthreat Based on Darknet by Estimating Anomalous Synchronization Using Graphical Lasso

    Chansu HAN  Jumpei SHIMAMURA  Takeshi TAKAHASHI  Daisuke INOUE  Jun'ichi TAKEUCHI  Koji NAKAO  

     
    PAPER-Information Network

      Pubricized:
    2020/06/25
      Vol:
    E103-D No:10
      Page(s):
    2113-2124

    With the rapid evolution and increase of cyberthreats in recent years, it is necessary to detect and understand it promptly and precisely to reduce the impact of cyberthreats. A darknet, which is an unused IP address space, has a high signal-to-noise ratio, so it is easier to understand the global tendency of malicious traffic in cyberspace than other observation networks. In this paper, we aim to capture global cyberthreats in real time. Since multiple hosts infected with similar malware tend to perform similar behavior, we propose a system that estimates a degree of synchronizations from the patterns of packet transmission time among the source hosts observed in unit time of the darknet and detects anomalies in real time. In our evaluation, we perform our proof-of-concept implementation of the proposed engine to demonstrate its feasibility and effectiveness, and we detect cyberthreats with an accuracy of 97.14%. This work is the first practical trial that detects cyberthreats from in-the-wild darknet traffic regardless of new types and variants in real time, and it quantitatively evaluates the result.

  • Sphere Packing Bound and Gilbert-Varshamov Bound for b-Symbol Read Channels

    Seunghoan SONG  Toru FUJIWARA  

     
    PAPER-Coding Theory

      Vol:
    E101-A No:11
      Page(s):
    1915-1924

    A b-symbol read channel is a channel model in which b consecutive symbols are read at once. As special cases, it includes a symbol-pair read channel (b=2) and an ordinary channel (b=1). The sphere packing bound, the Gilbert-Varshamov (G-V) bound, and the asymptotic G-V bound for symbol-pair read channels are known for b=1 and 2. In this paper, we derive these three bounds for b-symbol read channels with b≥1. From analysis of the proposed G-V bound, it is confirmed that the achievable rate is higher for b-symbol read channels compared with those for ordinary channels based on the Hamming metric. Furthermore, it is shown that the optimal value of b that maximizes the asymptotic G-V bound is finitely determined depending on the fractional minimum distance.

  • TCP Network Coding with Adapting Parameters for Bursty and Time-Varying Loss

    Nguyen VIET HA  Kazumi KUMAZOE  Masato TSURU  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2017/07/27
      Vol:
    E101-B No:2
      Page(s):
    476-488

    The Transmission Control Protocol (TCP) with Network Coding (TCP/NC) was proposed to introduce packet loss recovery ability at the sink without TCP retransmission, which is realized by proactively sending redundant combination packets encoded at the source. Although TCP/NC is expected to mitigate the goodput degradation of TCP over lossy networks, the original TCP/NC does not work well in burst loss and time-varying channels. No apparent scheme was provided to decide and change the network coding-related parameters (NC parameters) to suit the diverse and changeable loss conditions. In this paper, a solution to support TCP/NC in adapting to mentioned conditions is proposed, called TCP/NC with Loss Rate and Loss Burstiness Estimation (TCP/NCwLRLBE). Both the packet loss rate and burstiness are estimated by observing transmitted packets to adapt to burst loss channels. Appropriate NC parameters are calculated from the estimated probability of successful recoverable transmission based on a mathematical model of packet losses. Moreover, a new mechanism for coding window handling is developed to update NC parameters in the coding system promptly. The proposed scheme is implemented and validated in Network Simulator 3 with two different types of burst loss model. The results suggest the potential of TCP/NCwLRLBE to mitigate the TCP goodput degradation in both the random loss and burst loss channels with the time-varying conditions.

  • Theoretical Analyses on 2-Norm-Based Multiple Kernel Regressors

    Akira TANAKA  Hideyuki IMAI  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E100-A No:3
      Page(s):
    877-887

    The solution of the standard 2-norm-based multiple kernel regression problem and the theoretical limit of the considered model space are discussed in this paper. We prove that 1) The solution of the 2-norm-based multiple kernel regressor constructed by a given training data set does not generally attain the theoretical limit of the considered model space in terms of the generalization errors, even if the training data set is noise-free, 2) The solution of the 2-norm-based multiple kernel regressor is identical to the solution of the single kernel regressor under a noise free setting, in which the adopted single kernel is the sum of the same kernels used in the multiple kernel regressor; and it is also true for a noisy setting with the 2-norm-based regularizer. The first result motivates us to develop a novel framework for the multiple kernel regression problems which yields a better solution close to the theoretical limit, and the second result implies that it is enough to use the single kernel regressors with the sum of given multiple kernels instead of the multiple kernel regressors as long as the 2-norm based criterion is used.

  • A Novel Lambertian-RBFNN for Office Light Modeling

    Wa SI  Xun PAN  Harutoshi OGAI  Katsumi HIRAI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2016/04/18
      Vol:
    E99-D No:7
      Page(s):
    1742-1752

    In lighting control systems, accurate data of artificial light (lighting coefficients) are essential for the illumination control accuracy and energy saving efficiency. This research proposes a novel Lambertian-Radial Basis Function Neural Network (L-RBFNN) to realize modeling of both lighting coefficients and the illumination environment for an office. By adding a Lambertian neuron to represent the rough theoretical illuminance distribution of the lamp and modifying RBF neurons to regulate the distribution shape, L-RBFNN successfully solves the instability problem of conventional RBFNN and achieves higher modeling accuracy. Simulations of both single-light modeling and multiple-light modeling are made and compared with other methods such as Lambertian function, cubic spline interpolation and conventional RBFNN. The results prove that: 1) L-RBFNN is a successful modeling method for artificial light with imperceptible modeling error; 2) Compared with other existing methods, L-RBFNN can provide better performance with lower modeling error; 3) The number of training sensors can be reduced to be the same with the number of lamps, thus making the modeling method easier to apply in real-world lighting systems.

  • A New Class of Hilbert Pairs of Almost Symmetric Orthogonal Wavelet Bases

    Daiwei WANG  Xi ZHANG  

     
    PAPER-Digital Signal Processing

      Vol:
    E99-A No:5
      Page(s):
    884-891

    This paper proposes a new class of Hilbert pairs of almost symmetric orthogonal wavelet bases. For two wavelet bases to form a Hilbert pair, the corresponding scaling lowpass filters are required to satisfy the half-sample delay condition. In this paper, we design simultaneously two scaling lowpass filters with the arbitrarily specified flat group delay responses at ω=0, which satisfy the half-sample delay condition. In addition to specifying the number of vanishing moments, we apply the Remez exchange algorithm to minimize the difference of frequency responses between two scaling lowpass filters, in order to improve the analyticity of complex wavelets. The equiripple behavior of the error function can be obtained through a few iterations. Therefore, the resulting complex wavelets are orthogonal and almost symmetric, and have the improved analyticity. Finally, some examples are presented to demonstrate the effectiveness of the proposed design method.

  • Ensemble and Multiple Kernel Regressors: Which Is Better?

    Akira TANAKA  Hirofumi TAKEBAYASHI  Ichigaku TAKIGAWA  Hideyuki IMAI  Mineichi KUDO  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E98-A No:11
      Page(s):
    2315-2324

    For the last few decades, learning with multiple kernels, represented by the ensemble kernel regressor and the multiple kernel regressor, has attracted much attention in the field of kernel-based machine learning. Although their efficacy was investigated numerically in many works, their theoretical ground is not investigated sufficiently, since we do not have a theoretical framework to evaluate them. In this paper, we introduce a unified framework for evaluating kernel regressors with multiple kernels. On the basis of the framework, we analyze the generalization errors of the ensemble kernel regressor and the multiple kernel regressor, and give a sufficient condition for the ensemble kernel regressor to outperform the multiple kernel regressor in terms of the generalization error in noise-free case. We also show that each kernel regressor can be better than the other without the sufficient condition by giving examples, which supports the importance of the sufficient condition.

  • Far-Field Pattern Reconstruction Using an Iterative Hilbert Transform

    Fan FAN  Tapan K. SARKAR  Changwoo PARK  Jinhwan KOH  

     
    PAPER-Antennas and Propagation

      Vol:
    E98-B No:6
      Page(s):
    1032-1039

    A new approach to reconstructing antenna far-field patterns from the missing part of the pattern is presented in this paper. The antenna far-field pattern can be reconstructed by utilizing the iterative Hilbert transform, which is based on the relationship between the real and imaginary part of the Hilbert transform. A moving average filter is used to reduce the errors in the restored signal as well as the computation load. Under the constraint of the causality of the current source in space, we could successfully reconstruct the data. Several examples dealing with line source antennas and antenna arrays are simulated to illustrate the applicability of this approach.

  • Hilbert Transform Based Time-of-Flight Estimation of Multi-Echo Ultrasonic Signals and Its Resolution Analysis

    Zhenkun LU  Cui YANG  Gang WEI  

     
    LETTER-Ultrasonics

      Vol:
    E97-A No:9
      Page(s):
    1962-1965

    In non-destructive testing (NDT), ultrasonic echo is often an overlapping multi-echo signals with noise. However, the accurate estimation of ultrasonic time-of-flight (TOF) is essential in NDT. In this letter, a novel method for TOF estimation through envelope is proposed. Firstly, the wavelet denoising technique is applied to the noisy echo to improve the estimation accuracy. Then, the Hilbert transform (HT) is used in ultrasonic signal processing in order to extract the envelope of the echo. Finally, the TOF of each component of multi-echo signals is estimated by the local maximum point of signal envelope. Furthermore, the time resolution of time-overlapping ultrasonic echoes is discussed. Numerical simulation has been carried out to show the performances of the proposed method in estimating TOF of ultrasonic signal.

  • Analytic and Numerical Modeling of Normal Penetration of Early-Time (E1) High Altitude Electromagnetic Pulse (HEMP) into Dispersive Underground Multilayer Structures

    Hee-Do KANG  Il-Young OH  Tong-Ho CHUNG  Jong-Gwan YOOK  

     
    PAPER-Antennas and Propagation

      Vol:
    E96-B No:10
      Page(s):
    2625-2632

    In this paper, penetration phenomenon of an early-time (E1) high altitude electromagnetic pulse (HEMP) into dispersive underground multilayer structures is analyzed using electromagnetic modeling of wave propagation in frequency dependent lossy media. The electromagnetic pulse is dealt with in the power spectrum ranging from 100kHz to the 100MHz band, considering the fact that the power spectrum of the E1 HEMP rapidly decreases 30dB below its maximum value beyond the 100MHz band. In addition, the propagation channel consisting of several dielectric materials is modeled with the dispersive relative permittivity of each medium. Based on source and channel models, the propagation phenomenon is analyzed in the frequency and time domains. The attenuation levels at a 100m underground point are observed to be about 15 and 20dB at 100kHz and 1MHz, respectively, and the peak level of the penetrating electric field is found 5.6kV/m. To ensure the causality of the result, we utilize the Hilbert transform.

  • On Kernel Parameter Selection in Hilbert-Schmidt Independence Criterion

    Masashi SUGIYAMA  Makoto YAMADA  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E95-D No:10
      Page(s):
    2564-2567

    The Hilbert-Schmidt independence criterion (HSIC) is a kernel-based statistical independence measure that can be computed very efficiently. However, it requires us to determine the kernel parameters heuristically because no objective model selection method is available. Least-squares mutual information (LSMI) is another statistical independence measure that is based on direct density-ratio estimation. Although LSMI is computationally more expensive than HSIC, LSMI is equipped with cross-validation, and thus the kernel parameter can be determined objectively. In this paper, we show that HSIC can actually be regarded as an approximation to LSMI, which allows us to utilize cross-validation of LSMI for determining kernel parameters in HSIC. Consequently, both computational efficiency and cross-validation can be achieved.

  • Movement-Imagery Brain-Computer Interface: EEG Classification of Beta Rhythm Synchronization Based on Cumulative Distribution Function

    Teruyoshi SASAYAMA  Tetsuo KOBAYASHI  

     
    PAPER-Human-computer Interaction

      Vol:
    E94-D No:12
      Page(s):
    2479-2486

    We developed a novel movement-imagery-based brain-computer interface (BCI) for untrained subjects without employing machine learning techniques. The development of BCI consisted of several steps. First, spline Laplacian analysis was performed. Next, time-frequency analysis was applied to determine the optimal frequency range and latencies of the electroencephalograms (EEGs). Finally, trials were classified as right or left based on β-band event-related synchronization using the cumulative distribution function of pretrigger EEG noise. To test the performance of the BCI, EEGs during the execution and imagination of right/left wrist-bending movements were measured from 63 locations over the entire scalp using eight healthy subjects. The highest classification accuracies were 84.4% and 77.8% for real movements and their imageries, respectively. The accuracy is significantly higher than that of previously reported machine-learning-based BCIs in the movement imagery task (paired t-test, p < 0.05). It has also been demonstrated that the highest accuracy was achieved even though subjects had never participated in movement imageries.

  • Evaluation of GPU-Based Empirical Mode Decomposition for Off-Line Analysis

    Pulung WASKITO  Shinobu MIWA  Yasue MITSUKURA  Hironori NAKAJO  

     
    PAPER

      Vol:
    E94-D No:12
      Page(s):
    2328-2337

    In off-line analysis, the demand for high precision signal processing has introduced a new method called Empirical Mode Decomposition (EMD), which is used for analyzing a complex set of data. Unfortunately, EMD is highly compute-intensive. In this paper, we show parallel implementation of Empirical Mode Decomposition on a GPU. We propose the use of “partial+total” switching method to increase performance while keeping the precision. We also focused on reducing the computation complexity in the above method from O(N) on a single CPU to O(N/P log (N)) on a GPU. Evaluation results show our single GPU implementation using Tesla C2050 (Fermi architecture) achieves a 29.9x speedup partially, and a 11.8x speedup totally when compared to a single Intel dual core CPU.

  • Hilbert Scan Based Bag-of-Features for Image Retrieval

    Pengyi HAO  Sei-ichiro KAMATA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E94-D No:6
      Page(s):
    1260-1268

    Generally, two problems of bag-of-features in image retrieval are still considered unsolved: one is that spatial information about descriptors is not employed well, which affects the accuracy of retrieval; the other is that the trade-off between vocabulary size and good precision, which decides the storage and retrieval performance. In this paper, we propose a novel approach called Hilbert scan based bag-of-features (HS-BoF) for image retrieval. Firstly, Hilbert scan based tree representation (HSBT) is studied, which is built based on the local descriptors while spatial relationships are added into the nodes by a novel grouping rule, resulting of a tree structure for each image. Further, we give two ways of codebook production based on HSBT: multi-layer codebook and multi-size codebook. Owing to the properties of Hilbert scanning and the merits of our grouping method, sub-regions of the tree are not only flexible to the distribution of local patches but also have hierarchical relations. Extensive experiments on caltech-256, 13-scene and 1 million ImageNet images show that HS-BoF obtains higher accuracy with less memory usage.

1-20hit(66hit)