The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] noise-robust(5hit)

1-5hit
  • Low-Complexity and Accurate Noise Suppression Based on an a Priori SNR Model for Robust Speech Recognition on Embedded Systems and Its Evaluation in a Car Environment

    Masanori TSUJIKAWA  Yoshinobu KAJIKAWA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2023/02/28
      Vol:
    E106-A No:9
      Page(s):
    1224-1233

    In this paper, we propose a low-complexity and accurate noise suppression based on an a priori SNR (Speech to Noise Ratio) model for greater robustness w.r.t. short-term noise-fluctuation. The a priori SNR, the ratio of speech spectra and noise spectra in the spectral domain, represents the difference between speech features and noise features in the feature domain, including the mel-cepstral domain and the logarithmic power spectral domain. This is because logarithmic operations are used for domain conversions. Therefore, an a priori SNR model can easily be expressed in terms of the difference between the speech model and the noise model, which are modeled by the Gaussian mixture models, and it can be generated with low computational cost. By using a priori SNRs accurately estimated on the basis of an a priori SNR model, it is possible to calculate accurate coefficients of noise suppression filters taking into account the variance of noise, without serious increase in computational cost over that of a conventional model-based Wiener filter (MBW). We have conducted in-car speech recognition evaluation using the CENSREC-2 database, and a comparison of the proposed method with a conventional MBW showed that the recognition error rate for all noise environments was reduced by 9%, and that, notably, that for audio-noise environments was reduced by 11%. We show that the proposed method can be processed with low levels of computational and memory resources through implementation on a digital signal processor.

  • Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition

    Byung Ok KANG  Oh-Wook KWON  

     
    PAPER-Speech and Hearing

      Pubricized:
    2015/11/24
      Vol:
    E99-D No:3
      Page(s):
    724-730

    We propose a new method to combine multiple acoustic models in Gaussian mixture model (GMM) spaces for robust speech recognition. Even though large vocabulary continuous speech recognition (LVCSR) systems are recently widespread, they often make egregious recognition errors resulting from unavoidable mismatch of speaking styles or environments between the training and real conditions. To handle this problem, a multi-style training approach has been used conventionally to train a large acoustic model by using a large speech database with various kinds of speaking styles and environment noise. But, in this work, we combine multiple sub-models trained for different speaking styles or environment noise into a large acoustic model by maximizing the log-likelihood of the sub-model states sharing the same phonetic context and position. Then the combined acoustic model is used in a new target system, which is robust to variation in speaking style and diverse environment noise. Experimental results show that the proposed method significantly outperforms the conventional methods in two tasks: Non-native English speech recognition for second-language learning systems and noise-robust point-of-interest (POI) recognition for car navigation systems.

  • A Noise-Robust Continuous Speech Recognition System Using Block-Based Dynamic Range Adjustment

    Yiming SUN  Yoshikazu MIYANAGA  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:3
      Page(s):
    844-852

    A new approach to speech feature estimation under noise circumstances is proposed in this paper. It is used in noise-robust continuous speech recognition (CSR). As the noise robust techniques in isolated word speech recognition, the running spectrum analysis (RSA), the running spectrum filtering (RSF) and the dynamic range adjustment (DRA) methods have been developed. Among them, only RSA has been applied to a CSR system. This paper proposes an extended DRA for a noise-robust CSR system. In the stage of speech recognition, a continuous speech waveform is automatically assigned to a block defined by a short time length. The extended DRA is applied to these estimated blocks. The average recognition rate of the proposed method has been improved under several different noise conditions. As a result, the recognition rates are improved up to 15% in various noises with 10 dB SNR.

  • Integrated Ambient Light Sensor with an LTPS Noise-Robust Circuit and a-Si Photodiodes for AMLCDs Open Access

    Fumirou MATSUKI  Kazuyuki HASHIMOTO  Keiichi SANO  Fu-Yuan HSUEH  Ramesh KAKKAD  Wen-Sheng CHANG  J. Richard AYRES  Martin EDWARDS  Nigel D. YOUNG  

     
    INVITED PAPER

      Vol:
    E93-C No:11
      Page(s):
    1583-1589

    Ambient light sensors have been used to reduce power consumption of Active Matrix Liquid Crystal Displays (AMLCD) adjusting display brightness depending on ambient illumination. Discrete sensors have been commonly used for this purpose. They make module design complex. Therefore it has been required to integrate the sensors on the display panels for solving the issue. So far, many kinds of integrated sensors have been developed using Amorphous Silicon (a-Si) technology or Low Temperature Polycrystalline Silicon (LTPS) technology. These conventional integrated sensors have two problems. One is that LTPS sensors have less dynamic range due to the less photosensitivity of LTPS photodiodes. The other is that both the LTPS and a-Si sensors are susceptible to display driving noises. In this paper, we introduce a novel integrated sensor using both LTPS and a-Si technologies, which can solve these problems. It consists of vertical a-Si Schottky photodiodes and an LTPS differential converter circuit. The a-Si photodiodes have much higher photosensitivity than LTPS ones, and this contributes to wide dynamic range and high accuracy. The LTPS differential converter circuit converts photocurrent of the photodiodes to a robust digital signal. In addition it has a function of canceling the influences of the display driving noises. With the circuit, the sensor can stably and accurately work even under the noises. The performance of the sensor introduced in this paper was measured to verify the advantages of the novel design. The measurement result showed that it worked in a wide ambient illuminance range of 5-55,000 lux with small errors of below 5%. It was also verified that it stably and accurately worked even under the display driving noise. Thus the sensor introduced in this paper achieved the wide dynamic range and noise robustness.

  • Automatic Adjustment of Subband Likelihood Recombination Weights for Improving Noise-Robustness of a Multi-SNR Multi-Band Speaker Identification System

    Kenichi YOSHIDA  Kazuyuki TAKAGI  Kazuhiko OZEKI  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:11
      Page(s):
    2453-2459

    This paper is concerned with improving noise-robustness of a multi-SNR multi-band speaker identification system by introducing automatic adjustment of subband likelihood recombination weights. The adjustment is performed on the basis of subband power calculated from the noise observed just before the speech starts in the input signal. To evaluate the noise-robustness of this system, text-independent speaker identification experiments were conducted on speech data corrupted with noises recorded in five environments: "bus," "car," "office," "lobby," and "restaurant". It was found that the present method reduces the identification error by 15.9% compared with the multi-SNR multi-band method with equal recombination weights at 0 dB SNR. The performance of the present method was compared with a clean fullband method in which a speaker model training is performed on clean speech data, and spectral subtraction is applied to the input signal in the speaker identification stage. When the clean fullband method without spectral subtraction is taken as a baseline, the multi-SNR multi-band method with automatic adjustment of recombination weights attained 56.8% error reduction on average, while the average error reduction rate of the clean fullband method with spectral subtraction was 11.4% at 0 dB SNR.