1-5hit |
Huaijin DENG Takehito UTSURO Akio KOBAYASHI Hiromitsu NISHIZAKI
There have been lots of previous studies on fluency evaluation of spontaneous speech. However, most of them focus on lexical cues, and little emphasis is placed on how diverse acoustic features and deep end-to-end models contribute to improving the performance. In this paper, we describe multi-layer neural network to investigate not only lexical features extracted from transcription, but also consider utterance-level acoustic features from audio data. We also conduct the experiments to investigate the performance of end-to-end approaches with mel-spectrogram in this task. As the speech fluency evaluation task, we evaluate our proposed method in two binary classification tasks of fluent speech detection and disfluent speech detection. Speech data of around 10 seconds duration each with the annotation of the three classes of “fluent,” “neutral,” and “disfluent” is used for evaluation. According to the two way splits of those three classes, the task of fluent speech detection is defined as binary classification of fluent vs. neutral and disfluent, while that of disfluent speech detection is defined as binary classification of fluent and neutral vs. disfluent. We then conduct experiments with the purpose of comparative evaluation of multi-layer neural network with diverse features as well as end-to-end models. For the fluent speech detection, in the comparison of utterance-level disfluency-based, prosodic, and acoustic features with multi-layer neural network, disfluency-based and prosodic features only are better. More specifically, the performance improved a lot when removing all of the acoustic features from the full set of features, while the performance is damaged a lot if fillers related features are removed. Overall, however, the end-to-end Transformer+VGGNet model with mel-spectrogram achieves the best results. For the disfluent speech detection, the multi-layer neural network using disfluency-based, prosodic, and acoustic features without fillers achieves the best results. The end-to-end Transformer+VGGNet architecture also obtains high scores, whereas it is exceeded by the best results with the multi-layer neural network with significant difference. Thus, unlike in the fluent speech detection, disfluency-based and prosodic features other than fillers are still necessary in the disfluent speech detection.
Kento HASEGAWA Masao YANAGISAWA Nozomu TOGAWA
Recently, it has been reported that malicious third-party IC vendors often insert hardware Trojans into their products. Especially in IC design step, malicious third-party vendors can easily insert hardware Trojans in their products and thus we have to detect them efficiently. In this paper, we propose a machine-learning-based hardware-Trojan detection method for gate-level netlists using multi-layer neural networks. First, we extract 11 Trojan-net feature values for each net in a netlist. After that, we classify the nets in an unknown netlist into a set of Trojan nets and that of normal nets using multi-layer neural networks. By experimentally optimizing the structure of multi-layer neural networks, we can obtain an average of 84.8% true positive rate and an average of 70.1% true negative rate while we can obtain 100% true positive rate in some of the benchmarks, which outperforms the existing methods in most of the cases.
Eiko SUGAWARA Masaru FUKUSHI Susumu HORIGUCHI
This paper addresses the issue of reconfiguring multi-layer neural networks implemented in single or multiple VLSI chips. The ability to adaptively reconfigure network configuration for a given application, considering the presence of faulty neurons, is a very valuable feature in a large scale neural network. In addition, it has become necessary to achieve systems that can automatically reconfigure a network and acquire optimal weights without any help from host computers. However, self-reconfigurable architectures for neural networks have not been studied sufficiently. In this paper, we propose an architecture for a self-reconfigurable multi-layer neural network employing both reconfiguration with spare neurons and weight training by GAs. This proposal offers the combined advantages of low hardware overhead for adding spare neurons and fast weight training time. To show the possibility of self-reconfigurable neural networks, the prototype system has been implemented on a field programmable gate array.
Isao NAKANISHI Yoshio ITOH Yutaka FUKUI
As the nonlinear adaptive filter, the neural filter is utilized to process the nonlinear signal and/or system. However, the neural filter requires large number of iterations for convergence. This letter presents a new structure of the multi-layer neural filter where the orthonormal transform is introduced into all inter-layers to accelerate the convergence speed. The proposed structure is called the transform domain neural filter (TDNF) for convenience. The weights are basically updated by the Back-Propagation (BP) algorithm but it must be modified since the error back-propagates through the orthogonal transform. Moreover, the variable step size which is normalized by the transformed signal power is introduced into the BP algorithm to realize the orthonormal transform. Through the computer simulation, it is confirmed that the introduction of the orthonormal transform is effective for speedup of convergence in the neural filter.
Ashraf A. M. KHALAF Kenji NAKAYAMA
Time series prediction is very important technology in a wide variety of fields. The actual time series contains both linear and nonlinear properties. The amplitude of the time series to be predicted is usually continuous value. For these reasons, we combine nonlinear and linear predictors in a cascade form. The nonlinear prediction problem is reduced to a pattern classification. A set of the past samples x(n-1),. . . ,x(n-N) is transformed into the output, which is the prediction of the next coming sample x(n). So, we employ a multi-layer neural network with a sigmoidal hidden layer and a single linear output neuron for the nonlinear prediction. It is called a Nonlinear Sub-Predictor (NSP). The NSP is trained by the supervised learning algorithm using the sample x(n) as a target. However, it is rather difficult to generate the continuous amplitude and to predict linear property. So, we employ a linear predictor after the NSP. An FIR filter is used for this purpose, which is called a Linear Sub-Predictor (LSP). The LSP is trained by the supervised learning algorithm using also x(n) as a target. In order to estimate the minimum size of the proposed predictor, we analyze the nonlinearity of the time series of interest. The prediction is equal to mapping a set of past samples to the next coming sample. The multi-layer neural network is good for this kind of pattern mapping. Still, difficult mappings may exist when several sets of very similar patterns are mapped onto very different samples. The degree of difficulty of the mapping is closely related to the nonlinearity. The necessary number of the past samples used for prediction is determined by this nonlinearity. The difficult mapping requires a large number of the past samples. Computer simulations using the sunspot data and the artificially generated discrete amplitude data have demonstrated the efficiency of the proposed predictor and the nonlinearity analysis.