Federico ANG Rowena Cristina GUEVARA Yoshikazu MIYANAGA Rhandley CAJOTE Joel ILAO Michael Gringo Angelo BAYONA Ann Franchesca LAGUNA
In this paper, a new database suitable for HMM-based automatic Filipino speech recognition is described for the purpose of training a domain-independent, large-vocabulary continuous speech recognition system. Although it is known that high-performance speech recognition systems depend on a superior speech database used in the training stage, due to the lack of such an appropriate database, previous reports on Filipino speech recognition had to contend with serious data sparsity issues. In this paper we alleviate such sparsity through appropriate data analysis that makes the evaluation results more reliable. The best system is identified through its low word-error rate to a cross-validation set containing almost three hours of unknown speech data. Language-dependent problems are discussed, and their impact on accuracy was analyzed. The approach is currently data driven, however it serves as a competent baseline model for succeeding future developments.
Kazi OBAIDULLAH Constantin SIRITEANU Shingo YOSHIZAWA Yoshikazu MIYANAGA
Genetic algorithm (GA) is now an important tool in the field of wireless communications. For multiple-input/multiple-output (MIMO) wireless communications system employing spatial multiplexing transmission, we evaluate the effects of GA parameters value on channel parameters in fading channels. We assume transmit-correlated Rayleigh and Rician fading with realistic Laplacian power azimuth spectrum. Azimuth spread (AS) and Rician K-factor are selected according to the measurement-based WINNER II channel model for several scenarios. Herein we have shown the effects of GA parameters and channel parameters in different WINNER II scenarios (i.e., AS and K values) and rank of the deterministic components. We employ meta GA that suitably selects the population (P), generation (G) and mutation probability (pm) for the inner GA. Then we show the cumulative distribution function (CDF) obtain experimentally for the condition number C of the channel matrix H. It is found that, GA parameters depend on the channel parameters, i.e., GA parameters are the functions of the channel parameters. It is also found that for the poorer channel conditions smaller GA parameter values are required for MIMO detection. This approach will help to achieve maximum performance in practical condition for the lower numerical complexity.
Jun'ya SHIMIZU Yoshikazu MIYANAGA Koji TOCHINAI
In recent years, fractal processes have played important roles in various application fields. Since a 1/f process possesses the statistical self-similarity, it is considered sa a main part of fractal signal modeling. On the other hand, noise reduction is often needed in real-world signal processing. Hence, we propose an enhancement algorithm for 1/f signal disturbed by white noise. The algorithm is based on constrained minimization in a wavelet domain: the power of 1/f signal distortion in the wavelet domain is minimized under a constraint that the power of residual noise in the wavelet domain is smaller than a threshold level. We solve this constrained minimization problem using a Lagrangian equation. We also consider a setting method of the Lagrange multiplier in the proposed algorithm. In addition, we will confirm that the proposed algorithm with this Lagrange multiplier setting method obtains better enhancement results than the conventional algorithm through computer simulations.
Yusaku KANETA Shingo YOSHIZAWA Shin-ichi MINATO Hiroki ARIMURA Yoshikazu MIYANAGA
In this paper, we propose a novel architecture for large-scale regular expression matching, called dynamically reconfigurable bit-parallel NFA architecture (Dynamic BP-NFA), which allows dynamic loading of regular expressions on-the-fly as well as efficient pattern matching for fast data streams. This is the first dynamically reconfigurable hardware with guaranteed performance for the class of extended patterns, which is a subclass of regular expressions consisting of union of characters and its repeat. This class allows operators such as character classes, gaps, optional characters, and bounded and unbounded repeats of character classes. The key to our architecture is the use of bit-parallel pattern matching approach, in which the information of an input non-deterministic finite automaton (NFA) is first compactly encoded in bit-masks stored in a collection of registers and block RAMs. Then, the NFA is efficiently simulated by a fixed circuitry using bitwise Boolean and arithmetic operations consuming one input character per clock regardless of the actual contents of an input text. Experimental results showed that our hardwares for both string and extended patterns were comparable to previous dynamically reconfigurable hardwares in their performances.
Shingo YOSHIZAWA Yasushi YAMAUCHI Yoshikazu MIYANAGA
This paper presents a VLSI architecture of MMSE detection in a 44 MIMO-OFDM receiver. Packet-based MIMO-OFDM imposes a considerable throughput requirement on the matrix inversion because of strict timing in frame structure and subcarrier-by-subcarrier basis processing. Pipeline processing oriented algorithms are preferable to tackle this issue. We propose a pipelined MMSE detector using Strassen's algorithms of matrix inversion and multiplication. This circuit achieves real-time operation which does not depend on numbers of subcarriers. The designed circuit has been implemented to a 90-nm CMOS process and shows a potential for providing a 2.6-Gbps transmission speed in a 160-MHz signal bandwidth.
Hisayoshi KANO Shingo YOSHIZAWA Takashi GUNJI Shougo OKAMOTO Morio TAWARAYAMA Yoshikazu MIYANAGA
The IEEE802.11ac task group has announced the use of a wider channel that extends the channel bandwidth to more than 80 MHz. We present an experimental platform consisting of a baseband and a RF unit in a 22 MIMO-OFDM system for the wider channel and report its system performance results from a field experiment. The MIMO-OFDM transceiver in the baseband unit has been designed to detect real-time MIMO and provides a maximum data rate of 600 Mbps. OFDM tends to cause high peak PAPR for wider channels and distorts the power amplifier performance in the RF unit. We have improved the non-linear distortion by optimizing the OFDM preamble and evaluated its performance by conducting a simulation integrated with baseband processing and a RF. In the field experiment, our platform tested the communication performance in a farm and a passage environment.
Myat Hsu AUNG Hiroshi TSUTSUI Yoshikazu MIYANAGA
In this paper, we propose a WiFi-based indoor positioning system using a fingerprint method, whose database is constructed with estimated reference locations. The reference locations and their information, called data sets in this paper, are obtained by moving reference devices at a constant speed while gathering information of available access points (APs). In this approach, the reference locations can be estimated using the velocity without any precise reference location information. Therefore, the cost of database construction can be dramatically reduced. However, each data set includes some errors due to such as the fluctuation of received signal strength indicator (RSSI) values, the device-specific WiFi sensitivities, the AP installations, and removals. In this paper, we propose a method to merge data sets to construct a consistent database suppressing such undesired effects. The proposed approach assumes that the intervals of reference locations in the database are constant and that the fingerprint for each reference location is calculated from multiple data sets. Through experimental results, we reveal that our approach can achieve an accuracy of 80%. We also show a detailed discussion on the results related parameters in the proposed approach.
Wichai BOONKUMKLAO Yoshikazu MIYANAGA Kobchai DEJHAN
In this paper, we introduce a flexible design for intellectual property(IP) which has become important to design system LSI. The proposed IPs which have high flexibility for user requirement. The design priority is determined by setting parameters as the number of arithmetic unit, internal bitlength, clock speed and so on. The design time can thus be reduced. Designed IP is based on the reconfigurable architecture in which many structures can be dynamically selected. This paper shows a implementation of Frequency Response Masking digital filter(FRM) and Principal Components Analysis(PCA) using a reconfigurable architecture. We show the method to realize the designed circuit and the results of experiments using field programmable gate array(FPGA).
Shingo YOSHIZAWA Noboru HAYASAKA Naoya WADA Yoshikazu MIYANAGA
This paper describes a noise robustness technique that normalizes the cepstral amplitude range in order to remove the influence of additive noise. Additive noise causes speech feature mismatches between testing and training environments and it degrades recognition accuracy in noisy environments. We presume an approximate model that expresses the influence by changing the amplitude range and the DC component in the log-spectra. According to this model, we propose a cepstral amplitude range normalization (CARN) that normalizes the cepstral distance between maximum and minimum values. It can estimate noise robust features without prior knowledge or adaptation. We evaluated its performance in an isolated word recognition task by using the Noisex92 database. Compared with the combinations of conventional methods, the CARN could improve recognition accuracy under various SNR conditions.
Hideaki IMAI Yoshikazu MIYANAGA Koji TOCHINAI
This paper proposes a nonlinear signal processing by using a three layered network which is trained with self-organized clustering and supervised learning. The network consists of three layers, i.e., self-organized layer, an evaluation layer and an output layer. Since the evaluation layer is designed as a simple perceptron network and the output layer is designed as a fixed weight linear node, the training complexity is the same as a conventional one consisting of self-organized clustering and a simple perceptron network. In other words, quite high speed training can be realized. Generally speaking, since the data range is arbitrary large in signal procession, the network shoulk cover this range and output a value as accurately as possible. However, it may be hard for only a node in the network to output these data. Instead of this mechanism, if this dynamic range is covered by using several nodes, the complexity of each node is reduced and the associated range is also limited. This results on the higher performance of the network than conventional RBFs. This paper introduces a new non-linear spectrum estimation which consists of LPC analysis and RBF network. It is shown that accuracy spectrum envelopes can be obtained since a new RBF network can estimate some nonlinearities in a speech production.
Chusit PRADABPET Shingo YOSHIZAWA Yoshikazu MIYANAGA Kobchai DEJHAN
In this paper, we propose a new PAPR reduction by using the hybrid of a partial transmit sequences (PTS) and an adaptive peak power reduction (APPR) methods with coded side information (SI) technique. These methods are used in an Orthogonal Frequency Division Multiplexing (OFDM) system. The OFDM employs orthogonal sub-carriers for data modulation. These sub-carriers unexpectedly present a large Peak to Average Power Ratio (PAPR) in some cases. In order to reduce PAPR, the sequence of input data is rearranged by PTS. The APPR method is also used to controls the peak level of modulation signals by an adaptive algorithm. A proposed reduction method consists of these two methods and realizes both advantages at the same time. In order to make the optimum condition on PTS for PAPR reduction, a quite large calculation cost must be demanded and thus it is impossible to obtain the optimum PTS. In the proposed method, by using the pseudo-optimum condition with a coded SI technique, the total calculation cost becomes drastically reduced. In simulation results, the proposed method shows the improvement on PAPR and also reveals the high performance on bit error rate (BER) of an OFDM system.
Shingo YOSHIZAWA Yoshikazu MIYANAGA
We present area- and power-efficient pipeline 128- and 128/64-point fast Fourier transform (FFT) processors for 8x8 multiple-input multiple-output orthogonal frequency multiplexing (MIMO-OFDM) systems based on the specification framework of IEEE 802.11ac WLANs. Our new FFT processors use mixed-radix multipath delay commutator (MRMDC) architecture from the point of view of low complexity and high memory use. A conventional MRMDC architecture induces large circuits in delay commutators, which change the order of data sequences for the butterfly units. The proposed architecture replaces delay elements with new commutators that cooperate with other MIMO-OFDM processing blocks. These commutators are inserted in the front and rear of the input and output memory units. Our FFT processors exhibit a 50–51% reduction in logic gates and 70–72% reduction in power dissipation as compared with conventional ones.
Koji SASAKI Nobuhiro MIKI Yoshikazu MIYANAGA
We propose an auto-mesh generation algorithm for 3-Dimensional elliptic model on acoustic analysis of the vocal tract. We mesh the vocal tract and compute the vocal tract transfer function (VTTF) using Finite Element Method (FEM). We show there is little difference between the VTTF using our algorithm and that of the manual mesh, especially for vowel /a/. We show that the number of nodes is depended on the shape of the cross section of the vocal tract. Furthermore we compute the VTTF of the vocal tract with variable shape continuously.
Jun'ya SHIMIZU Yoshikazu MIYANAGA Koji TOCHINAI
In many actual applications of the adaptive filtering, input signals as well as output signals often contain observation noises. Hence, it is necessary to develop an adaptive filtering algorithm to such an errors-in-variables (EIV) model. One solution for identifying the EIV model is a total least squares (TLS) algorithm based on a singular value decomposition of an off-line processing. However, it has not been considered to identify the EIV IIR system using an adaptive TLS algorithm of which stability has been guaranteed during adaptation process. Hence we propose a normalized lattice IIR adaptive filtering algorithm for the TLS parameter estimation. We also show the effectiveness of the proposed algorithm under noisy circumstances through simulations.
Xin XU Noboru HAYASAKA Yoshikazu MIYANAGA
This paper proposes a new algorithm named Adaptive Running Spectrum Filtering (ARSF) to restore the amplitude spectra of speech corrupted by additive noises. Based on the pre-hand noise estimation, adaptive filtering is used in speech modulation spectra according to the noise conditions. The periodic structures in the amplitude spectra are kept against noise distortion. Since the amplitude spectral structures contain the information of fundamental frequency, which is the inverse of pitch period, ARSF algorithm is added into robust pitch detection to increase the accuracy. Compared with the conventional methods, experimental results show that the proposed method significantly improves the robustness of pitch detection against noise conditions with several types and SNRs.
A new approach to speech feature estimation under noise circumstances is proposed in this paper. It is used in noise-robust continuous speech recognition (CSR). As the noise robust techniques in isolated word speech recognition, the running spectrum analysis (RSA), the running spectrum filtering (RSF) and the dynamic range adjustment (DRA) methods have been developed. Among them, only RSA has been applied to a CSR system. This paper proposes an extended DRA for a noise-robust CSR system. In the stage of speech recognition, a continuous speech waveform is automatically assigned to a block defined by a short time length. The extended DRA is applied to these estimated blocks. The average recognition rate of the proposed method has been improved under several different noise conditions. As a result, the recognition rates are improved up to 15% in various noises with 10 dB SNR.
Yoshikazu MIYANAGA Wataru TAKAHASHI Shingo YOSHIZAWA
This paper introduces our developed noise robust speech communication techniques and describes its implementation to a smart info-media system, i.e., a small robot. Our designed speech communication system consists of automatic speech detection, recognition, and rejection. By using automatic speech detection and recognition, an observed speech waveform can be recognized without a manual trigger. In addition, using speech rejection, this system only accepts registered speech phrases and rejects any other words. In other words, although an arbitrary input speech waveform can be fed into this system and recognized, the system responds only to the registered speech phrases. The developed noise robust speech processing can reduce various noises in many environments. In addition to the design of noise robust speech recognition, the LSI design of this system has been introduced. By using the design of speech recognition application specific IC (ASIC), we can simultaneously realize low power consumption and real-time processing. This paper describes the LSI architecture of this system and its performances in some field experiments. In terms of current speech recognition accuracy, the system can realize 85-99% under 0-20dB SNR and echo environments.