The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.72

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E91-D No.3  (Publication Date:2008/03/01)

    Special Section on Robust Speech Processing in Realistic Environments
  • FOREWORD Open Access

    Kazuya TAKEDA  

     
    FOREWORD

      Page(s):
    391-392
  • Signal Processing Techniques for Robust Speech Recognition

    Futoshi ASANO  

     
    INVITED PAPER

      Page(s):
    393-401

    In this paper, signal processing techniques which can be applied to automatic speech recognition to improve its robustness are reviewed. The choice of signal processing techniques is strongly dependent on the scenario of the applications. The analysis of scenario and the choice of suitable signal processing techniques are shown through two examples.

  • Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models

    Takatoshi JITSUHIRO  Tomoji TORIYAMA  Kiyoshi KOGURE  

     
    PAPER-Noisy Speech Recognition

      Page(s):
    402-410

    We propose a noise suppression method based on multi-model compositions and multi-pass search. In real environments, input speech for speech recognition includes many kinds of noise signals. To obtain good recognized candidates, suppressing many kinds of noise signals at once and finding target speech is important. Before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Noise suppression is frame-synchronously performed using the multiple models selected by recognized label sequences with time alignments. We evaluated this method using the E-Nightingale task, which contains voice memoranda spoken by nurses during actual work at hospitals. The proposed method obtained higher performance than the conventional method.

  • Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs

    Norihide KITAOKA  Souta HAMAGUCHI  Seiichi NAKAGAWA  

     
    PAPER-Noisy Speech Recognition

      Page(s):
    411-421

    To achieve high recognition performance for a wide variety of noise and for a wide range of signal-to-noise ratio, this paper presents methods for integration of four noise reduction algorithms: spectral subtraction with smoothing of time direction, temporal domain SVD-based speech enhancement, GMM-based speech estimation and KLT-based comb-filtering. In this paper, we proposed two types of combination methods of noise suppression algorithms: selection of front-end processor and combination of results from multiple recognition processes. Recognition results on the CENSREC-1 task showed the effectiveness of our proposed methods.

  • Robust Speech Recognition by Model Adaptation and Normalization Using Pre-Observed Noise

    Satoshi KOBASHIKAWA  Satoshi TAKAHASHI  

     
    PAPER-Noisy Speech Recognition

      Page(s):
    422-429

    Users require speech recognition systems that offer rapid response and high accuracy concurrently. Speech recognition accuracy is degraded by additive noise, imposed by ambient noise, and convolutional noise, created by space transfer characteristics, especially in distant talking situations. Against each type of noise, existing model adaptation techniques achieve robustness by using HMM-composition and CMN (cepstral mean normalization). Since they need an additive noise sample as well as a user speech sample to generate the models required, they can not achieve rapid response, though it may be possible to catch just the additive noise in a previous step. In the previous step, the technique proposed herein uses just the additive noise to generate an adapted and normalized model against both types of noise. When the user's speech sample is captured, only online-CMN need be performed to start the recognition processing, so the technique offers rapid response. In addition, to cover the unpredictable S/N values possible in real applications, the technique creates several S/N HMMs. Simulations using artificial speech data show that the proposed technique increased the character correct rate by 11.62% compared to CMN.

  • Feature Compensation Employing Multiple Environmental Models for Robust In-Vehicle Speech Recognition

    Wooil KIM  John H.L. HANSEN  

     
    PAPER-Noisy Speech Recognition

      Page(s):
    430-438

    An effective feature compensation method is developed for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, used for evaluation, contains a range of speech and noise signals collected for a number of speakers under actual driving conditions. PCGMM-based feature compensation, considered in this paper, utilizes parallel model combination to generate noise-corrupted speech model by combining clean speech and the noise model. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, an Environment Transition Model is employed, which is motivated from Noise Language Model used in Environmental Sniffing. An environment dependent scheme of mixture sharing technique is proposed and shown to be more effective in reducing the computational complexity. A smaller environmental model set is determined by the environment transition model for mixture sharing. The proposed scheme is evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. A reduction of 73.10% of the computational requirements was obtained by employing the environment dependent mixture sharing scheme with only a slight change in recognition performance. This demonstrates that the proposed method is effective in maintaining the distinctive characteristics among the different environmental models, even when selecting a large number of Gaussian components for mixture sharing.

  • Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

    Tran HUY DAT  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Speech Enhancement

      Page(s):
    439-447

    We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.

  • Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

    Yotaro KUBO  Shigeki OKAWA  Akira KUREMATSU  Katsuhiko SHIRAI  

     
    PAPER-ASR under Reverberant Conditions

      Page(s):
    448-456

    We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect. 2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect. 3. The proposed system is described in Sect. 4. The experimental setup is presented in Sect. 5, and the results are discussed in Sect. 6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.

  • Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

    Longbiao WANG  Seiichi NAKAGAWA  Norihide KITAOKA  

     
    PAPER-ASR under Reverberant Conditions

      Page(s):
    457-466

    In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.

  • Noise Robust Voice Activity Detection Based on Switching Kalman Filter

    Masakiyo FUJIMOTO  Kentaro ISHIZUKA  

     
    PAPER-Voice Activity Detection

      Page(s):
    467-477

    This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a clean speech / silence state transition model beforehand, and sequentially adapts the model to the noisy environment by using a switching Kalman filter when a signal is observed. In this paper, we carried out two evaluations. In the first, we observed that the proposed method significantly outperforms conventional methods as regards voice activity detection accuracy in simulated noise environments. Second, we evaluated the proposed method on a VAD evaluation framework, CENSREC-1-C. The evaluation results revealed that the proposed method significantly outperforms the baseline results of CENSREC-1-C as regards VAD accuracy in real environments. In addition, we confirmed that the proposed method helps to improve the accuracy of concatenated speech recognition in real environments.

  • Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition

    Makoto SAKAI  Norihide KITAOKA  Seiichi NAKAGAWA  

     
    PAPER-Feature Extraction

      Page(s):
    478-487

    To precisely model the time dependency of features is one of the important issues for speech recognition. Segmental unit input HMM with a dimensionality reduction method has been widely used to address this issue. Linear discriminant analysis (LDA) and heteroscedastic extensions, e.g., heteroscedastic linear discriminant analysis (HLDA) or heteroscedastic discriminant analysis (HDA), are popular approaches to reduce dimensionality. However, it is difficult to find one particular criterion suitable for any kind of data set in carrying out dimensionality reduction while preserving discriminative information. In this paper, we propose a new framework which we call power linear discriminant analysis (PLDA). PLDA can be used to describe various criteria including LDA, HLDA, and HDA with one control parameter. In addition, we provide an efficient selection method using a control parameter without training HMMs nor testing recognition performance on a development data set. Experimental results show that the PLDA is more effective than conventional methods for various data sets.

  • Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors

    Mohammad NURUL HUDA  Muhammad GHULAM  Takashi FUKUDA  Kouichi KATSURADA  Tsuneo NITTA  

     
    PAPER-Feature Extraction

      Page(s):
    488-498

    This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.

  • Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training

    Tobias CINCAREK  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Acoustic Modeling

      Page(s):
    499-507

    Development of an ASR application such as a speech-oriented guidance system for a real environment is expensive. Most of the costs are due to human labeling of newly collected speech data to construct the acoustic model for speech recognition. Employment of existing models or sharing models across multiple applications is often difficult, because the characteristics of speech depend on various factors such as possible users, their speaking style and the acoustic environment. Therefore, this paper proposes a combination of unsupervised learning and selective training to reduce the development costs. The employment of unsupervised learning alone is problematic due to the task-dependency of speech recognition and because automatic transcription of speech is error-prone. A theoretically well-defined approach to automatic selection of high quality and task-specific speech data from an unlabeled data pool is presented. Only those unlabeled data which increase the model likelihood given the labeled data are employed for unsupervised training. The effectivity of the proposed method is investigated with a simulation experiment to construct adult and child acoustic models for a speech-oriented guidance system. A completely human-labeled database which contains real-environment data collected over two years is available for the development simulation. It is shown experimentally that the employment of selective training alleviates the problems of unsupervised learning, i.e. it is possible to select speech utterances of a certain speaker group but discard noise inputs and utterances with lower recognition accuracy. The simulation experiment is carried out for several selected combinations of data collection and human transcription period. It is found empirically that the proposed method is especially effective if only relatively few of the collected data can be labeled and transcribed by humans.

  • Using Mutual Information Criterion to Design an Efficient Phoneme Set for Chinese Speech Recognition

    Jin-Song ZHANG  Xin-Hui HU  Satoshi NAKAMURA  

     
    PAPER-Acoustic Modeling

      Page(s):
    508-513

    Chinese is a representative tonal language, and it has been an attractive topic of how to process tone information in the state-of-the-art large vocabulary speech recognition system. This paper presents a novel way to derive an efficient phoneme set of tone-dependent units to build a recognition system, by iteratively merging a pair of tone-dependent units according to the principle of minimal loss of the Mutual Information (MI). The mutual information is measured between the word tokens and their phoneme transcriptions in a training text corpus, based on the system lexical and language model. The approach has a capability to keep discriminative tonal (and phoneme) contrasts that are most helpful for disambiguating homophone words due to lack of tones, and merge those tonal (and phoneme) contrasts that are not important for word disambiguation for the recognition task. This enables a flexible selection of phoneme set according to a balance between the MI information amount and the number of phonemes. We applied the method to traditional phoneme set of Initial/Finals, and derived several phoneme sets with different number of units. Speech recognition experiments using the derived sets showed its effectiveness.

  • Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval

    Qingqing ZHANG  Jielin PAN  Yang LIN  Jian SHAO  Yonghong YAN  

     
    PAPER-Acoustic Modeling

      Page(s):
    514-521

    In recent decades, there has been a great deal of research into the problem of bilingual speech recognition - to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users' native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.

  • Language Modeling Using PLSA-Based Topic HMM

    Atsushi SAKO  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER-Language Modeling

      Page(s):
    522-528

    In this paper, we propose a PLSA-based language model for sports-related live speech. This model is implemented using a unigram rescaling technique that combines a topic model and an n-gram. In the conventional method, unigram rescaling is performed with a topic distribution estimated from a recognized transcription history. This method can improve the performance, but it cannot express topic transition. By incorporating the concept of topic transition, it is expected that the recognition performance will be improved. Thus, the proposed method employs a "Topic HMM" instead of a history to estimate the topic distribution. The Topic HMM is an Ergodic HMM that expresses typical topic distributions as well as topic transition probabilities. Word accuracy results from our experiments confirmed the superiority of the proposed method over a trigram and a PLSA-based conventional method that uses a recognized history.

  • A One-Pass Real-Time Decoder Using Memory-Efficient State Network

    Jian SHAO  Ta LI  Qingqing ZHANG  Qingwei ZHAO  Yonghong YAN  

     
    PAPER-ASR System Architecture

      Page(s):
    529-537

    This paper presents our developed decoder which adopts the idea of statically optimizing part of the knowledge sources while handling the others dynamically. The lexicon, phonetic contexts and acoustic model are statically integrated to form a memory-efficient state network, while the language model (LM) is dynamically incorporated on the fly by means of extended tokens. The novelties of our approach for constructing the state network are (1) introducing two layers of dummy nodes to cluster the cross-word (CW) context dependent fan-in and fan-out triphones, (2) introducing a so-called "WI layer" to store the word identities and putting the nodes of this layer in the non-shared mid-part of the network, (3) optimizing the network at state level by a sufficient forward and backward node-merge process. The state network is organized as a multi-layer structure for distinct token propagation at each layer. By exploiting the characteristics of the state network, several techniques including LM look-ahead, LM cache and beam pruning are specially designed for search efficiency. Especially in beam pruning, a layer-dependent pruning method is proposed to further reduce the search space. The layer-dependent pruning takes account of the neck-like characteristics of WI layer and the reduced variety of word endings, which enables tighter beam without introducing much search errors. In addition, other techniques including LM compression, lattice-based bookkeeping and lattice garbage collection are also employed to reduce the memory requirements. Experiments are carried out on a Mandarin spontaneous speech recognition task where the decoder involves a trigram LM and CW triphone models. A comparison with HDecode of HTK toolkits shows that, within 1% performance deviation, our decoder can run 5 times faster with half of the memory footprint.

  • Selection of Optimum Vocabulary and Dialog Strategy for Noise-Robust Spoken Dialog Systems

    Akinori ITO  Takanobu OBA  Takashi KONASHI  Motoyuki SUZUKI  Shozo MAKINO  

     
    PAPER-ASR System Architecture

      Page(s):
    538-548

    Speech recognition in a noisy environment is one of the hottest topics in the speech recognition research. Noise-tolerant acoustic models or noise reduction techniques are often used to improve recognition accuracy. In this paper, we propose a method to improve accuracy of spoken dialog system from a language model point of view. In the proposed method, the dialog system automatically changes its language model and dialog strategy according to the estimated recognition accuracy in a noisy environment in order to keep the performance of the system high. In a noise-free environment, the system accepts any utterance from a user. On the other hand, the system restricts its grammar and vocabulary in a noisy environment. To realize this strategy, we investigated a method to avoid the user's out-of-grammar utterances through an instruction given by the system to a user. Furthermore, we developed a method to estimate recognition accuracy from features extracted from noise signals. Finally, we realized a proposed dialog system according to these investigations.

  • Evaluation of a Noise-Robust Multi-Stream Speaker Verification Method Using F0 Information

    Taichi ASAMI  Koji IWANO  Sadaoki FURUI  

     
    PAPER-Speaker Verification

      Page(s):
    549-557

    We have previously proposed a noise-robust speaker verification method using fundamental frequency (F0) extracted using the Hough transform. The method also incorporates an automatic stream-weight and decision threshold estimation technique. It has been confirmed that the proposed method is effective for white noise at various SNR conditions. This paper evaluates the proposed method in more practical in-car and elevator-hall noise conditions. The paper first describes the noise-robust F0 extraction method and details of our robust speaker verification method using multi-stream HMMs for integrating the extracted F0 and cepstral features. Details of the automatic stream-weight and threshold estimation method for multi-stream speaker verification framework are also explained. This method simultaneously optimizes stream-weights and a decision threshold by combining the linear discriminant analysis (LDA) and the Adaboost technique. Experiments were conducted using Japanese connected digit speech contaminated by white, in-car, or elevator-hall noise at various SNRs. Experimental results show that the F0 features improve the verification performance in various noisy environments, and that our stream-weight and threshold optimization method effectively estimates control parameters so that FARs and FRRs are adjusted to achieve equal error rates (EERs) under various noisy conditions.

  • Speaker Verification in Realistic Noisy Environment in Forensic Science

    Toshiaki KAMADA  Nobuaki MINEMATSU  Takashi OSANAI  Hisanori MAKINAE  Masumi TANIMOTO  

     
    PAPER-Speaker Verification

      Page(s):
    558-566

    In forensic voice telephony speaker verification, we may be requested to identify a speaker in a very noisy environment, unlike the conditions in general research. In a noisy environment, we process speech first by clarifying it. However, the previous study of speaker verification from clarified speech did not yield satisfactory results. In this study, we experimented on speaker verification with clarification of speech in a noisy environment, and we examined the relationship between improving acoustic quality and speaker verification results. Moreover, experiments with realistic noise such as a crime prevention alarm and power supply noise was conducted, and speaker verification accuracy in a realistic environment was examined. We confirmed the validity of speaker verification with clarification of speech in a realistic noisy environment.

  • Automatic Language Identification with Discriminative Language Characterization Based on SVM

    Hongbin SUO  Ming LI  Ping LU  Yonghong YAN  

     
    PAPER-Language Identification

      Page(s):
    567-575

    Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.

  • Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System

    Tobias CINCAREK  Hiromichi KAWANAMI  Ryuichi NISIMURA  Akinobu LEE  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Applications

      Page(s):
    576-587

    In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the system's performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i.e. speech recognizer and the response generation module, is assessed. Although depending on the system's modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15 k utterances for training the acoustic model, 40-50 k utterances for training the language model and 40 k-50 k utterances for compiling the question and answer database. The Q&A database was most important for improving the system's response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.

  • Bilingual Cluster Based Models for Statistical Machine Translation

    Hirofumi YAMAMOTO  Eiichiro SUMITA  

     
    PAPER-Applications

      Page(s):
    588-597

    We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

  • Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability Criteria

    Yuki DENDA  Takanobu NISHIURA  Yoichi YAMASHITA  

     
    PAPER-Applications

      Page(s):
    598-606

    This paper proposes a robust omnidirectional audio-visual (AV) talker localizer for AV applications. The proposed localizer consists of two innovations. One of them is robust omnidirectional audio and visual features. The direction of arrival (DOA) estimation using an equilateral triangular microphone array, and human position estimation using an omnidirectional video camera extract the AV features. The other is a dynamic fusion of the AV features. The validity criterion, called the audio- or visual-localization counter, validates each audio- or visual-feature. The reliability criterion, called the speech arriving evaluator, acts as a dynamic weight to eliminate any prior statistical properties from its fusion procedure. The proposed localizer can compatibly achieve talker localization in a speech activity and user localization in a non-speech activity under the identical fusion rule. Talker localization experiments were conducted in an actual room to evaluate the effectiveness of the proposed localizer. The results confirmed that the talker localization performance of the proposed AV localizer using the validity and reliability criteria is superior to that of conventional localizers.

  • Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

    Goshu NAGINO  Makoto SHOZAKAI  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Corpus

      Page(s):
    607-614

    This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.

  • An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus

    Jin-Song ZHANG  Satoshi NAKAMURA  

     
    PAPER-Corpus

      Page(s):
    615-630

    An efficient way to develop large scale speech corpora is to collect phonetically rich ones that have high coverage of phonetic contextual units. The sentence set, usually called as the minimum set, should have small text size in order to reduce the collection cost. It can be selected by a greedy search algorithm from a large mother text corpus. With the inclusion of more and more phonetic contextual effects, the number of different phonetic contextual units increased dramatically, making the search not a trivial issue. In order to improve the search efficiency, we previously proposed a so-called least-to-most-ordered greedy search based on the conventional algorithms. This paper evaluated these algorithms in order to show their different characteristics. The experimental results showed that the least-to-most-ordered methods successfully achieved smaller objective sets at significantly less computation time, when compared with the conventional ones. This algorithm has already been applied to the development a number of speech corpora, including a large scale phonetically rich Chinese speech corpus ATRPTH which played an important role in developing our multi-language translation system.

  • Bi-Spectral Acoustic Features for Robust Speech Recognition

    Kazuo ONOE  Shoei SATO  Shinichi HOMMA  Akio KOBAYASHI  Toru IMAI  Tohru TAKAGI  

     
    LETTER

      Page(s):
    631-634

    The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.

  • Local Peak Enhancement for In-Car Speech Recognition in Noisy Environment

    Osamu ICHIKAWA  Takashi FUKUDA  Masafumi NISHIMURA  

     
    LETTER

      Page(s):
    635-639

    The accuracy of automatic speech recognition in a car is significantly degraded in a very low SNR (Signal to Noise Ratio) situation such as "Fan high" or "Window open". In such cases, speech signals are often buried in broadband noise. Although several existing noise reduction algorithms are known to improve the accuracy, other approaches that can work with them are still required for further improvement. One of the candidates is enhancement of the harmonic structures in human voices. However, most conventional approaches are based on comb filtering, and it is difficult to use them in practical situations, because their assumptions for F0 detection and for voiced/unvoiced detection are not accurate enough in realistic noisy environments. In this paper, we propose a new approach that does not rely on such detection. An observed power spectrum is directly converted into a filter for speech enhancement, by retaining only the local peaks considered to be harmonic structures in the human voice. In our experiments, this approach reduced the word error rate by 17% in realistic automobile environments. Also, it showed further improvement when used with existing noise reduction methods.

  • Special Section on Test and Verification of VLSIs
  • FOREWORD Open Access

    Seiji KAJIHARA  Michiko INOUE  

     
    FOREWORD

      Page(s):
    640-641
  • A Conservative Framework for Safety-Failure Checking

    Frederic BEAL  Tomohiro YONEDA  Chris J. MYERS  

     
    PAPER-Verification and Timing Analysis

      Page(s):
    642-654

    We present a new framework for checking safety failures. The approach is based on the conservative inference of the internal states of a system by the observation of the interaction with its environment. It is based on two similar mechanisms : forward implication, which performs the analysis of the consequences of an input applied to the system, and backward implication, that performs the same task for an output transition. While being a very simple approach, it is general and we believe it can yield efficient algorithms in different safety-failure checking problems. As a case study, we have applied this framework to an existing problem, the hazard checking in (speed-independent) asynchronous circuits. Our new methodology yields an efficient algorithm that performs better or as well as all existing algorithms, while being more general than the fastest one.

  • Timing Analysis Considering Temporal Supply Voltage Fluctuation

    Masanori HASHIMOTO  Junji YAMAGUCHI  Takashi SATO  Hidetoshi ONODERA  

     
    PAPER-Verification and Timing Analysis

      Page(s):
    655-660

    This paper proposes an approach to cope with temporal power/ground voltage fluctuation for static timing analysis. The proposed approach replaces temporal noise with an equivalent power/ground voltage. This replacement reduces complexity that comes from the variety in noise waveform shape, and improves compatibility of power/ground noise aware timing analysis with conventional timing analysis framework. Experimental results show that the proposed approach can compute gate propagation delay considering temporal noise within 10% error in maximum and 0.5% in average.

  • A Method of Locating Open Faults on Incompletely Identified Pass/Fail Information

    Koji YAMAZAKI  Yuzo TAKAMATSU  

     
    PAPER-Fault Diagnosis

      Page(s):
    661-666

    In order to reduce the test cost, built-in self test (BIST) is widely used. One of the serious problems of BIST is that the compacted signature in BIST has very little information for fault diagnosis. Especially, it is difficult to determine which tests detect a fault. Therefore, it is important to develop an efficient fault diagnosis method by using incompletely identified pass/fail information. Where the incompletely identified pass/fail information means that a failing test block consists of at least one failing test and some passing tests, and all of the tests in passing test blocks are the passing test. In this paper, we propose a method to locate open faults by using incompletely identified pass/fail information. Experimental results for ISCAS'85 and ITC'99 benchmark circuits show that the number of candidate faults becomes less than 5 in many cases.

  • A Novel Per-Test Fault Diagnosis Method Based on the Extended X-Fault Model for Deep-Submicron LSI Circuits

    Yuta YAMATO  Yusuke NAKAMURA  Kohei MIYASE  Xiaoqing WEN  Seiji KAJIHARA  

     
    PAPER-Fault Diagnosis

      Page(s):
    667-674

    Per-test diagnosis based on the X-fault model is an effective approach for a circuit with physical defects of non-deterministic logic behavior. However, the extensive use of vias and buffers in a deep-submicron circuit and the unpredictable order relation among threshold voltages at the fanout branches of a gate have not been fully addressed by conventional per-test X-fault diagnosis. To take these factors into consideration, this paper proposes an improved per-test X-fault diagnosis method, featuring (1) an extended X-fault model to handle vias and buffers and (2) the use of occurrence probabilities of logic behaviors for a physical defect to handle the unpredictable relation among threshold voltages. Experimental results show the effectiveness of the proposed method.

  • Fault Diagnosis on Multiple Fault Models by Using Pass/Fail Information

    Yuzo TAKAMATSU  Hiroshi TAKAHASHI  Yoshinobu HIGAMI  Takashi AIKYO  Koji YAMAZAKI  

     
    PAPER-Fault Diagnosis

      Page(s):
    675-682

    In general, we do not know which fault model can explain the cause of the faulty values at the primary outputs in a circuit under test before starting diagnosis. Moreover, under Built-In Self Test (BIST) environment, it is difficult to know which primary output has a faulty value on the application of a failing test pattern. In this paper, we propose an effective diagnosis method on multiple fault models, based on only pass/fail information on the applied test patterns. The proposed method deduces both the fault model and the fault location based on the number of detections for the single stuck-at fault at each line, by performing single stuck-at fault simulation with both passing and failing test patterns. To improve the ability of fault diagnosis, our method uses the logic values of lines and the condition whether the stuck-at faults at the lines are detected or not by passing and failing test patterns. Experimental results show that our method can accurately identify the fault models (stuck-at fault model, AND/OR bridging fault model, dominance bridging fault model, or open fault model) for 90% faulty circuits and that the faulty sites are located within two candidate faults.

  • On Detection of Bridge Defects with Stuck-at Tests

    Kohei MIYASE  Kenta TERASHIMA  Xiaoqing WEN  Seiji KAJIHARA  Sudhakar M. REDDY  

     
    PAPER-Defect-Based Testing

      Page(s):
    683-689

    If a test set for more complex faults than stuck-at faults is generated, higher defect coverage would be obtained. Such a test set, however, would have a large number of test vectors, and hence the test costs would go up. In this paper we propose a method to detect bridge defects with a test set initially generated for stuck-at faults in a full scan sequential circuit. The proposed method doesn't add new test vectors to the test set but modifies test vectors. Therefore there are no negative impacts on test data volume and test application time. The initial fault coverage for stuck-at faults of the test set is guaranteed with modified test vectors. In this paper we focus on detecting as many as possible non-feedback AND-type, OR-type and 4-way bridging faults, respectively. Experimental results show that the proposed method increases the defect coverage.

  • Fault Simulation and Test Generation for Transistor Shorts Using Stuck-at Test Tools

    Yoshinobu HIGAMI  Kewal K. SALUJA  Hiroshi TAKAHASHI  Shin-ya KOBAYASHI  Yuzo TAKAMATSU  

     
    PAPER-Defect-Based Testing

      Page(s):
    690-699

    This paper presents methods for detecting transistor short faults using logic level fault simulation and test generation. The paper considers two types of transistor level faults, namely strong shorts and weak shorts, which were introduced in our previous research. These faults are defined based on the values of outputs of faulty gates. The proposed fault simulation and test generation are performed using gate-level tools designed to deal with stuck-at faults, and no transistor-level tools are required. In the test generation process, a circuit is modified by inserting inverters, and a stuck-at test generator is used. The modification of a circuit does not mean a design-for-testability technique, as the modified circuit is used only during the test generation process. Further, generated test patterns are compacted by fault simulation. Also, since the weak short model involves uncertainty in its behavior, we define fault coverage and fault efficiency in three different way, namely, optimistic, pessimistic and probabilistic and assess them. Finally, experimental results for ISCAS benchmark circuits are used to demonstrate the effectiveness of the proposed methods.

  • Ramp Voltage Testing for Detecting Interconnect Open Faults

    Yukiya MIURA  

     
    PAPER-Defect-Based Testing

      Page(s):
    700-705

    A method for detecting interconnect open faults of CMOS combinational circuits by applying a ramp voltage to the power supply terminal is proposed. The method can assign a known logic value to a fault location automatically by applying a ramp voltage and as a result, it requires only one test vector to detect a fault as a delay fault or an erroneous logic value at primary outputs. In this paper, we show fault detectability and effectiveness of the proposed method by simulation-based and theoretical analysis. We also expose that the method can be applicable to every fault location in a circuit and open faults with any value. Finally, we show ATPG results that are suitable to the proposed method.

  • Study on Expansion of Convolutional Compactors over Galois Field

    Masayuki ARAI  Satoshi FUKUMOTO  Kazuhiko IWASAKI  

     
    PAPER-Test Compression

      Page(s):
    706-712

    Convolutional compactors offer a promising technique of compacting test responses. In this study we expand the architecture of convolutional compactor onto a Galois field in order to improve compaction ratio as well as reduce X-masking probability, namely, the probability that an error is masked by unknown values. While each scan chain is independently connected by EOR gates in the conventional arrangement, the proposed scheme treats q signals as an element over GF(2q), and the connections are configured on the same field. We show the arrangement of the proposed compactors and the equivalent expression over GF(2). We then evaluate the effectiveness of the proposed expansion in terms of X-masking probability by simulations with uniform distribution of X-values, as well as reduction of hardware overheads. Furthermore, we evaluate a multi-weight arrangement of the proposed compactors for non-uniform X distributions.

  • An Architecture of Embedded Decompressor with Reconfigurability for Test Compression

    Hideyuki ICHIHARA  Tomoyuki SAIKI  Tomoo INOUE  

     
    PAPER-Test Compression

      Page(s):
    713-719

    Test compression / decompression scheme for reducing the test application time and memory requirement of an LSI tester has been proposed. In the scheme, the employed coding algorithms are tailored to a given test data, so that the tailored coding algorithm can highly compress the test data. However, these methods have some drawbacks, e.g., the coding algorithm is ineffective in extra test data except for the given test data. In this paper, we introduce an embedded decompressor that is reconfigurable according to coding algorithms and given test data. Its reconfigurability can overcome the drawbacks of conventional decompressors with keeping high compression ratio. Moreover, we propose an architecture of reconfigurable decompressors for four variable-length codings. In the proposed architecture, the common functions for four codings are implemented as fixed (or non-reconfigurable) components so as to reduce the configuration data, which is stored on an ATE and sent to a CUT. Experimental results show that (1) the configuration data size becomes reasonably small by reducing the configuration part of the decompressor, (2) the reconfigurable decompressor is effective for SoC testing in respect of the test data size, and (3) it can achieve an optimal compression of test data by Huffman coding.

  • Study on Test Data Reduction Combining Illinois Scan and Bit Flipping

    Masayuki ARAI  Satoshi FUKUMOTO  Kazuhiko IWASAKI  

     
    PAPER-Test Compression

      Page(s):
    720-725

    In this paper, we propose a scheme for test data reduction which uses broadcaster along with bit-flipping circuit. The proposed scheme can reduce test data without degrading the fault coverage of ATPG, and without requiring or modifying the arrangement of CUT. We theoretically analyze the test data size by the proposed scheme. The numerical examples obtained by the analysis and experimental results show that our scheme can effectively reduce test data if the care-bit rate is not so much low according to the number of scan chains. We also discuss the hybrid scheme of random-pattern-based flipping and single-input-based flipping.

  • Test Data Compression for Scan-Based BIST Aiming at 100x Compression Rate

    Masayuki ARAI  Satoshi FUKUMOTO  Kazuhiko IWASAKI  Tatsuru MATSUO  Takahisa HIRAIDE  Hideaki KONISHI  Michiaki EMORI  Takashi AIKYO  

     
    PAPER-Test Compression

      Page(s):
    726-735

    We developed test data compression scheme for scan-based BIST, aiming to compress test stimuli and responses by more than 100 times. As scan-BIST architecture, we adopt BIST-Aided Scan Test (BAST), and combines four techniques: the invert-and-shift operation, run-length compression, scan address partitioning, and LFSR pre-shifting. Our scheme achieved a 100x compression rate in environments where Xs do not occur without reducing the fault coverage of the original ATPG vectors. Furthermore, we enhanced the masking logic to reduce data for X-masking so that test data is still compressed to 1/100 in a practical environment where Xs occur. We applied our scheme to five real VLSI chips, and the technique compressed the test data by 100x for scan-based BIST.

  • Scheduling Power-Constrained Tests through the SoC Functional Bus

    Fawnizu Azmadi HUSSIN  Tomokazu YONEDA  Alex ORAILOLU  Hideo FUJIWARA  

     
    PAPER-High-Level Testing

      Page(s):
    736-746

    This paper proposes a test methodology for core-based testing of System-on-Chips by utilizing the functional bus as a test access mechanism. The functional bus is used as a transportation channel for the test stimuli and responses from a tester to the cores under test (CUT). To enable test concurrency, local test buffers are added to all CUTs. In order to limit the buffer area overhead while minimizing the test application time, we propose a packet-based scheduling algorithm called PAcket Set Scheduling (PASS), which finds the complete packet delivery schedule under a given power constraint. The utilization of test packets, consisting of a small number of bits of test data, for test data delivery allow an efficient sharing of bus bandwidth with the help of an effective buffer-based test architecture. The experimental results show that the methodology is highly effective, especially for smaller bus widths, compared to previous approaches that do not use the functional bus.

  • Test Scheduling for Multi-Clock Domain SoCs under Power Constraint

    Tomokazu YONEDA  Kimihiko MASUDA  Hideo FUJIWARA  

     
    PAPER-High-Level Testing

      Page(s):
    747-755

    This paper presents a power-constrained test scheduling method for multi-clock domain SoCs that consist of cores operating at different clock frequencies during test. In the proposed method, we utilize virtual TAM to solve the frequency gaps between cores and the ATE. Moreover, we present a technique to reduce power consumption of cores during test while the test time of the cores remain the same or increase a little by using virtual TAM. Experimental results show the effectiveness of the proposed method.

  • A Self-Test of Dynamically Reconfigurable Processors with Test Frames

    Tomoo INOUE  Takashi FUJII  Hideyuki ICHIHARA  

     
    PAPER-High-Level Testing

      Page(s):
    756-762

    This paper proposes a self-test method of coarse grain dynamically reconfigurable processors (DRPs) without hardware overhead. In the method, processor elements (PEs) compose a test frame, which consists of test pattern generators (TPGs), processor elements under test (PEUTs) and response analyzers (RAs), while testing themselves one another by changing test frames appropriately. We design several test frames with different structures, and discuss the relationship of the structures to the numbers of contexts and test frames for testing all the functions of PEs. A case study shows that there exists an optimal test frame which minimizes the test application time under a constraint.

  • Design for Testability Method to Avoid Error Masking of Software-Based Self-Test for Processors

    Masato NAKAZATO  Michiko INOUE  Satoshi OHTAKE  Hideo FUJIWARA  

     
    PAPER-High-Level Testing

      Page(s):
    763-770

    In this paper, we propose a design for testability method for test programs of software-based self-test using test program templates. Software-based self-test using templates has a problem of error masking where some faults detected in a test generation for a module are not detected by the test program synthesized from the test. The proposed method achieves 100% template level fault efficiency, that is, it completely avoids the error masking. Moreover, the proposed method has no performance degradation (adds only observation points) and enables at-speed testing.

  • Post-BIST Fault Diagnosis for Multiple Faults

    Hiroshi TAKAHASHI  Yoshinobu HIGAMI  Shuhei KADOYAMA  Yuzo TAKAMATSU  Koji YAMAZAKI  Takashi AIKYO  Yasuo SATO  

     
    LETTER

      Page(s):
    771-775

    With the increasing complexity of LSI, Built-In Self Test (BIST) is a promising technique for production testing. We herein propose a method for diagnosing multiple stuck-at faults based on the compressed responses from BIST. We refer to fault diagnosis based on the ambiguous test pattern set obtained by the compressed responses of BIST as post-BIST fault diagnosis [1]. In the present paper, we propose an effective method by which to perform post-BIST fault diagnosis for multiple stuck-at faults. The efficiency of the success ratio and the feasibility of diagnosing large circuits are discussed.

  • A Secure Test Technique for Pipelined Advanced Encryption Standard

    Youhua SHI  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    LETTER

      Page(s):
    776-780

    In this paper, we presented a Design-for-Secure-Test (DFST) technique for pipelined AES to guarantee both the security and the test quality during testing. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.

  • Regular Section
  • A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns

    Kazuya HARAGUCHI  Mutsunori YAGIURA  Endre BOROS  Toshihide IBARAKI  

     
    PAPER-Algorithm Theory

      Page(s):
    781-788

    We consider a data set in which each example is an n-dimensional Boolean vector labeled as true or false. A pattern is a co-occurrence of a particular value combination of a given subset of the variables. If a pattern appears frequently in the true examples and infrequently in the false examples, we consider it a good pattern. In this paper, we discuss the problem of determining the data size needed for removing "deceptive" good patterns; in a data set of a small size, many good patterns may appear superficially, simply by chance, independently of the underlying structure. Our hypothesis is that, in order to remove such deceptive good patterns, the data set should contain a greater number of examples than that at which a random data set contains few good patterns. We justify this hypothesis by computational studies. We also derive a theoretical upper bound on the needed data size in view of our hypothesis.

  • Near-Optimal Block Alignments

    Kuo-Tsung TSENG  Chang-Biau YANG  Kuo-Si HUANG  Yung-Hsing PENG  

     
    PAPER-Algorithm Theory

      Page(s):
    789-795

    The optimal alignment of two given biosequences is mathematically optimal, but it may not be a biologically optimal one. To investigate more possible alignments with biological meaning, one can relax the scoring functions to get near-optimal alignments. Though the near optimal alignments increase the possibility of finding the correct alignment, they may confuse the biologists because the size of candidates is large. In this paper, we present the filter scheme for the near-optimal alignments. An easy method for tracing the near-optimal alignments and an algorithm for filtering those alignments are proposed. The time complexity of our algorithm is O(dmn) in the worst case, where d is the maximum distance between the near-optimal alignments and the optimal alignment, and m and n are the lengths of the input sequences, respectively.

  • Dynamic Scheduling Real-Time Task Using Primary-Backup Overloading Strategy for Multiprocessor Systems

    Wei SUN  Chen YU  Xavier DEFAGO  Yasushi INOGUCHI  

     
    PAPER-Dependable Computing

      Page(s):
    796-806

    The scheduling of real-time tasks with fault-tolerant requirements has been an important problem in multiprocessor systems. The primary-backup (PB) approach is often used as a fault-tolerant technique to guarantee the deadlines of tasks despite the presence of faults. In this paper we propose a dynamic PB-based task scheduling approach, wherein an allocation parameter is used to search the available time slots for a newly arriving task, and the previously scheduled tasks can be re-scheduled when there is no available time slot for the newly arriving task. In order to improve the schedulability we also propose an overloading strategy for PB-overloading and Backup-backup (BB) overloading. Our proposed task scheduling algorithm is compared with some existing scheduling algorithms in the literature through simulation studies. The results have shown that the task rejection ratio of our real-time task scheduling algorithm is almost 50% lower than the compared algorithms.

  • Effective Domain Partitioning for Multi-Clock Domain IP Core Wrapper Design under Power Constraints

    Thomas Edison YU  Tomokazu YONEDA  Danella ZHAO  Hideo FUJIWARA  

     
    PAPER-Dependable Computing

      Page(s):
    807-814

    The rapid advancement of VLSI technology has made it possible for chip designers and manufacturers to embed the components of a whole system onto a single chip, called System-on-Chip or SoC. SoCs make use of pre-designed modules, called IP-cores, which provide faster design time and quicker time-to-market. Furthermore, SoCs that operate at multiple clock domains and very low power requirements are being utilized in the latest communications, networking and signal processing devices. As a result, the testing of SoCs and multi-clock domain embedded cores under power constraints has been rapidly gaining importance. In this research, a novel method for designing power-aware test wrappers for embedded cores with multiple clock domains is presented. By effectively partitioning the various clock domains, we are able to increase the solution space of possible test schedules for the core. Since previous methods were limited to concurrently testing all the clock domains, we effectively remove this limitation by making use of bandwidth conversion, multiple shift frequencies and properly gating the clock signals to control the shift activity of various core logic elements. The combination of the above techniques gains us greater flexibility when determining an optimal test schedule under very tight power constraints. Furthermore, since it is computationally intensive to search the entire expanded solution space for the possible test schedules, we propose a heuristic 3-D bin packing algorithm to determine the optimal wrapper architecture and test schedule while minimizing the test time under power and bandwidth constraints.

  • Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

    Shoei SATO  Akio KOBAYASHI  Kazuo ONOE  Shinichi HOMMA  Toru IMAI  Tohru TAKAGI  Tetsunori KOBAYASHI  

     
    PAPER-Speech and Hearing

      Page(s):
    815-824

    We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights from the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in real-time from mutual information between an input stream and active HMM states in a search space without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.

  • Transformed-Domain Mode Selection for H.264 Intra-Prediction Improvement

    Yung-Chiang WEI  Jar-Ferr YANG  

     
    PAPER-Image Processing and Video Processing

      Page(s):
    825-835

    In this paper, a fast mode decision method for intra-prediction is proposed to reduce the computational complexity of H.264/AVC encoders. With edge information, we propose a novel fast estimation algorithm to reduce the computation overhead of H.264/AVC for mode selection, where the edge direction of each coding block is detected from only part of the transformed coefficients. Hence, the computation complexity is greatly reduced. Experimental results show that the proposed fast mode decision method can eliminate about 81.34% encoding time for all intra-frame sequences with acceptable degradation of averaged PSNR and bitrates.

  • Reversible Steganographic Method with High Payload for JPEG Images

    Chih-Yang LIN  Chin-Chen CHANG  Yu-Zheng WANG  

     
    PAPER-Image Processing and Video Processing

      Page(s):
    836-845

    This paper presents a lossless steganography method based on the multiple-base notation approach for JPEG images. Embedding a large amount of secret data in a JPEG-compressed image is a challenge since modifying the quantized DCT coefficients may cause serious image distortion. We propose two main strategies to deal with this problem: (1) we embed the secret values in the middle-frequency of the quantized DCT coefficients, and (2) we limit the number of nonzero values of the quantized DCT coefficients that participate in the embedding process. We also investigated the effect of modifying the standard quantization table. The experimental results show that the proposed method can embed twice as much secret data as the irreversible embedding method of Iwata et al. under the same number of embedded sets. The results also demonstrate how three important factors: (1) the quantization table, (2) the number of selected nonzero quantized DCT coefficients, and (3) the number of selected sets, influence the image quality and embedding capacity.

  • An Effective Load Balancing Scheme for 3D Texture-Based Sort-Last Parallel Volume Rendering on GPU Clusters

    Won-Jong LEE  Vason P. SRINI  Woo-Chan PARK  Shigeru MURAKI  Tack-Don HAN  

     
    PAPER-Computer Graphics

      Page(s):
    846-856

    We present an adaptive dynamic load balancing scheme for 3D texture based sort-last parallel volume rendering on a PC cluster equipped with GPUs. Our scheme exploits not only task parallelism but also data parallelism during rendering by combining the hierarchical data structures (octree and parallel BSP tree) in order to skip empty regions and distribute proper workloads to rendering nodes. Our scheme can also conduct a valid parallel rendering and image compositing in visibility order by employing a 3D clustering algorithm. To alleviate the imbalance when the transfer function is changed, a load rebalancing is inexpensively supported by exchanging only needed data. A detailed performance analysis is provided and scaling characteristics of our scheme are discussed. These show that our scheme can achieve significant performance gains by increasing parallelism and decreasing synchronizing costs compared to the traditional static distribution schemes.

  • A Checkpointing Method with Small Checkpoint Latency

    Masato KITAKAMI  Bochuan CAI  Hideo ITO  

     
    LETTER-Dependable Computing

      Page(s):
    857-861

    The cost of checkpointing consists of checkpoint overhead and checkpoint latency. The former is the time to stop the process for checkpointing. The latter is the time to complete the checkpointing including background checkpointing which stores memory pages. The large checkpoint latency increases the possibility that the error occurs in background checkpointing, which leads to long rollback distance. The method for small checkpoint latency has not been proposed yet. This paper proposes a checkpointing method which achieves small checkpoint latency. The proposed method divides a checkpoint interval into several subcheckpoint intervals. By using the history of memory page modification in subcheckpoint intervals, the proposed method saves some pages which are not expected to be modified in the rest of checkpoint interval in advance. Computer simulation says that the proposed method can reduce the checkpoint latency by 25% comparing to the existing methods.

  • Robust Watermarking Scheme Applied to Radiological Medical Images

    Raul RODRIGUEZ COLIN  Claudia FEREGRINO URIBE  Jose-Alberto MARTINEZ VILLANUEVA  

     
    LETTER-Application Information Security

      Page(s):
    862-864

    We present a watermarking scheme that combines data compression and encryption in application to radiological medical images. In this approach we combine the image moment theory and image homogeneity in order to recover the watermark after a geometrical distortion. Image quality is measured with metrics used in image processing, such as PSNR and MSE.

  • A Feasibility Study of Fuzzy FES Controller Based on Cycle-to-Cycle Control: An Experimental Test of Knee Extension Control

    Takashi WATANABE  Tomoya MASUKO  Achmad ARIFIN  Makoto YOSHIZAWA  

     
    LETTER-Rehabilitation Engineering and Assistive Technology

      Page(s):
    865-868

    Functional Electrical Stimulation (FES) can be effective in assisting or restoring paralyzed motor functions. The purpose of this study is to examine experimentally the fuzzy controller based on cycle-to-cycle control for FES-induced gait. A basic experimental test was performed on controlling maximum knee extension angle with normal subjects. In most of control trials, the joint angle was controlled well compensating changes in muscle responses to electrical stimulation. The results show that the fuzzy controller would be practical in clinical applications of gait control by FES. An automatic parameter tuning would be required practically for quick responses in reaching the target and in compensating the change in muscle responses without causing oscillating responses.

  • Modeling Bottom-Up Visual Attention for Color Images

    Congyan LANG  De XU  Ning LI  

     
    LETTER-Image Processing and Video Processing

      Page(s):
    869-872

    Modeling visual attention provides an alternative methodology to image description in many applications such as adaptive content delivery and image retrieval. In this paper, we propose a robust approach to the modeling bottom-up visual attention. The main contributions are twofold: 1) We use a principal component analysis (PCA) to transform the RGB color space into three principal components, which intrinsically leads to an opponent representation of colors to ensure good saliency analysis. 2) A practicable framework for modeling visual attention is presented based on a region-level reliability analysis for each feature map. And then the salient map can be robustly generated for a variety of nature images. Experiments show that the proposed algorithm is effective and can characterize the human perception well.