Dinh Thi Thuy NGA MinGon KIM Minho KANG
Sleep-mode operation of a Mobile Subscriber Station (MSS) in IEEE 802.16e effectively saves energy consumption; however, it induces frame response delay. In this letter, we propose an algorithm to quickly find the optimal value of the final sleep interval in sleep-mode in order to minimize energy consumption with respect to a given frame response delay constraint. The validations of our proposed algorithm through analytical results and simulation results suggest that our algorithm provide a potential guidance to energy saving.
Yibo FAN Takeshi IKENAGA Satoshi GOTO
With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25 µm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180 MHz and the throughput of 1024-bit RSA encryption is 352 kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.
Tobias CINCAREK Hiromichi KAWANAMI Ryuichi NISIMURA Akinobu LEE Hiroshi SARUWATARI Kiyohiro SHIKANO
In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the system's performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i.e. speech recognizer and the response generation module, is assessed. Although depending on the system's modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15 k utterances for training the acoustic model, 40-50 k utterances for training the language model and 40 k-50 k utterances for compiling the question and answer database. The Q&A database was most important for improving the system's response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.
This paper presents a novel high-speed low-complexity pipelined degree-computationless modified Euclidean (pDCME) algorithm architecture for high-speed RS decoders. The pDCME algorithm allows elimination of the degree-computation so as to reduce hardware complexity and obtain high-speed processing. A high-speed RS decoder based on the pDCME algorithm has been designed and implemented with 0.13-µm CMOS standard cell technology in a supply voltage of 1.1 V. The proposed RS decoder operates at a clock frequency of 660 MHz and has a throughput of 5.3 Gb/s. The proposed architecture requires approximately 15% fewer gate counts and a simpler control logic than architectures based on the popular modified Euclidean algorithm.
Goshu NAGINO Makoto SHOZAKAI Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO
This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.
Jinsul KIM Hyunwoo LEE Won RYU Seungho HAN Minsoo HAHN
This letter mainly focuses on improving current noise reduction methods to solve the critical speech distortion problems with robust noise reduction in noisy speech signals for speech enhancement over IP networks. For robust noise reduction with packet loss recovery, we propose a novel optimized Wiener filtering technique that uses the estimated SNR (Signal-to-Noise Ratio) with packet loss recovery method which is applied as post-filtering over IP-networks. Simulation results demonstrate that the proposed scheme provides better reduction and recovery rates with considering packet loss and SNR environment than other methods.
Osamu ICHIKAWA Takashi FUKUDA Masafumi NISHIMURA
The accuracy of automatic speech recognition in a car is significantly degraded in a very low SNR (Signal to Noise Ratio) situation such as "Fan high" or "Window open". In such cases, speech signals are often buried in broadband noise. Although several existing noise reduction algorithms are known to improve the accuracy, other approaches that can work with them are still required for further improvement. One of the candidates is enhancement of the harmonic structures in human voices. However, most conventional approaches are based on comb filtering, and it is difficult to use them in practical situations, because their assumptions for F0 detection and for voiced/unvoiced detection are not accurate enough in realistic noisy environments. In this paper, we propose a new approach that does not rely on such detection. An observed power spectrum is directly converted into a filter for speech enhancement, by retaining only the local peaks considered to be harmonic structures in the human voice. In our experiments, this approach reduced the word error rate by 17% in realistic automobile environments. Also, it showed further improvement when used with existing noise reduction methods.
Atsushi SAKO Tetsuya TAKIGUCHI Yasuo ARIKI
In this paper, we propose a PLSA-based language model for sports-related live speech. This model is implemented using a unigram rescaling technique that combines a topic model and an n-gram. In the conventional method, unigram rescaling is performed with a topic distribution estimated from a recognized transcription history. This method can improve the performance, but it cannot express topic transition. By incorporating the concept of topic transition, it is expected that the recognition performance will be improved. Thus, the proposed method employs a "Topic HMM" instead of a history to estimate the topic distribution. The Topic HMM is an Ergodic HMM that expresses typical topic distributions as well as topic transition probabilities. Word accuracy results from our experiments confirmed the superiority of the proposed method over a trigram and a PLSA-based conventional method that uses a recognized history.
Norihide KITAOKA Souta HAMAGUCHI Seiichi NAKAGAWA
To achieve high recognition performance for a wide variety of noise and for a wide range of signal-to-noise ratio, this paper presents methods for integration of four noise reduction algorithms: spectral subtraction with smoothing of time direction, temporal domain SVD-based speech enhancement, GMM-based speech estimation and KLT-based comb-filtering. In this paper, we proposed two types of combination methods of noise suppression algorithms: selection of front-end processor and combination of results from multiple recognition processes. Recognition results on the CENSREC-1 task showed the effectiveness of our proposed methods.
An effective feature compensation method is developed for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, used for evaluation, contains a range of speech and noise signals collected for a number of speakers under actual driving conditions. PCGMM-based feature compensation, considered in this paper, utilizes parallel model combination to generate noise-corrupted speech model by combining clean speech and the noise model. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, an Environment Transition Model is employed, which is motivated from Noise Language Model used in Environmental Sniffing. An environment dependent scheme of mixture sharing technique is proposed and shown to be more effective in reducing the computational complexity. A smaller environmental model set is determined by the environment transition model for mixture sharing. The proposed scheme is evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. A reduction of 73.10% of the computational requirements was obtained by employing the environment dependent mixture sharing scheme with only a slight change in recognition performance. This demonstrates that the proposed method is effective in maintaining the distinctive characteristics among the different environmental models, even when selecting a large number of Gaussian components for mixture sharing.
Masato NAKAZATO Michiko INOUE Satoshi OHTAKE Hideo FUJIWARA
In this paper, we propose a design for testability method for test programs of software-based self-test using test program templates. Software-based self-test using templates has a problem of error masking where some faults detected in a test generation for a module are not detected by the test program synthesized from the test. The proposed method achieves 100% template level fault efficiency, that is, it completely avoids the error masking. Moreover, the proposed method has no performance degradation (adds only observation points) and enables at-speed testing.
Akinori ITO Takanobu OBA Takashi KONASHI Motoyuki SUZUKI Shozo MAKINO
Speech recognition in a noisy environment is one of the hottest topics in the speech recognition research. Noise-tolerant acoustic models or noise reduction techniques are often used to improve recognition accuracy. In this paper, we propose a method to improve accuracy of spoken dialog system from a language model point of view. In the proposed method, the dialog system automatically changes its language model and dialog strategy according to the estimated recognition accuracy in a noisy environment in order to keep the performance of the system high. In a noise-free environment, the system accepts any utterance from a user. On the other hand, the system restricts its grammar and vocabulary in a noisy environment. To realize this strategy, we investigated a method to avoid the user's out-of-grammar utterances through an instruction given by the system to a user. Furthermore, we developed a method to estimate recognition accuracy from features extracted from noise signals. Finally, we realized a proposed dialog system according to these investigations.
Jin-Song ZHANG Xin-Hui HU Satoshi NAKAMURA
Chinese is a representative tonal language, and it has been an attractive topic of how to process tone information in the state-of-the-art large vocabulary speech recognition system. This paper presents a novel way to derive an efficient phoneme set of tone-dependent units to build a recognition system, by iteratively merging a pair of tone-dependent units according to the principle of minimal loss of the Mutual Information (MI). The mutual information is measured between the word tokens and their phoneme transcriptions in a training text corpus, based on the system lexical and language model. The approach has a capability to keep discriminative tonal (and phoneme) contrasts that are most helpful for disambiguating homophone words due to lack of tones, and merge those tonal (and phoneme) contrasts that are not important for word disambiguation for the recognition task. This enables a flexible selection of phoneme set according to a balance between the MI information amount and the number of phonemes. We applied the method to traditional phoneme set of Initial/Finals, and derived several phoneme sets with different number of units. Speech recognition experiments using the derived sets showed its effectiveness.
Yun-Jeong KIM Jong-Ho LEE Ja-Hyun KOO Kwang-Hyun BAEK Suki KIM
In this paper, we describe a 6-bit 1.6-GS/s flash analog to digital converter (ADC). To reduce the power consumption and active area, we propose a new interpolation architecture using a symmetric three-input comparator. This ADC achieves 5.56 effective bits for input frequencies up to 220 MHz at 1.6 GS/s, and almost five effective bits for 660 MHz input at 1.6 GS/s. Peak INL and DNL are less than 0.5 LSB and 0.45 LSB, respectively. This ADC consumes 85 mW from 1.8 V at 1.6 GS/s and occupies an active area of 0.27 mm2. It is fabricated in 0.18-µm CMOS.
Frederic BEAL Tomohiro YONEDA Chris J. MYERS
We present a new framework for checking safety failures. The approach is based on the conservative inference of the internal states of a system by the observation of the interaction with its environment. It is based on two similar mechanisms : forward implication, which performs the analysis of the consequences of an input applied to the system, and backward implication, that performs the same task for an output transition. While being a very simple approach, it is general and we believe it can yield efficient algorithms in different safety-failure checking problems. As a case study, we have applied this framework to an existing problem, the hazard checking in (speed-independent) asynchronous circuits. Our new methodology yields an efficient algorithm that performs better or as well as all existing algorithms, while being more general than the fastest one.
In order to achieve the prioritized quality of service (QoS) guarantee, the IEEE 802.11e EDCAF (the enhanced distributed channel access function) provides the distinguished services by configuring the different QoS parameters to different access categories (ACs). An admission control scheme is needed to maximize the utilization of wireless channel. Most of papers study throughput improvement by solving the complicated multidimensional Markov-chain model. In this paper, we introduce a backoff model to study the transmission probability of the different arbitration interframe space number (AIFSN) and the minimum contention window size (CWmin). We propose an adaptive control scheme (ACS) to dynamically update AIFSN and CWmin based on the periodical monitoring of current channel status and QoS requirements to achieve the specific service differentiation at access points (AP). This paper provides an effective tuning mechanism for improving QoS in WLAN. Analytical and simulation results show that the proposed scheme outperforms the basic EDCAF in terms of throughput and service differentiation especially at high collision rate.
Tran HUY DAT Kazuya TAKEDA Fumitada ITAKURA
We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.
Keiichi FUNAKI Tatsuhiko KINJO
Complex speech analysis for an analytic speech signal can accurately estimate the spectrum in low frequencies since the analytic signal provides spectrum only over positive frequencies. The remarkable feature makes it possible to realize more accurate F0 estimation using complex residual signal extracted by complex-valued speech analysis. We have already proposed F0 estimation using complex LPC residual, in which the autocorrelation function weighted by AMDF was adopted as the criterion. The method adopted MMSE-based complex LPC analysis and it has been reported that it can estimate more accurate F0 for IRS filtered speech corrupted by white Gauss noise although it can not work better for the IRS filtered speech corrupted by pink noise. In this paper, robust complex speech analysis based on ELS (Extended Least Square) method is introduced in order to overcome the drawback. The experimental results for additive white Gauss or pink noise demonstrate that the proposed algorithm based on robust ELS-based complex AR analysis can perform better than other methods.
We propose a fast decoding algorithm for the p-ary first-order Reed-Muller code guaranteeing correction of up to errors and having complexity proportional to nlog n, where n = pm is the code length and p is an odd prime. This algorithm is an extension in the complex domain of the fast Hadamard transform decoding algorithm applicable to the binary case.
Kazuo ONOE Shoei SATO Shinichi HOMMA Akio KOBAYASHI Toru IMAI Tohru TAKAGI
The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.