The search functionality is under construction.

Keyword Search Result

[Keyword] speech detection(3hit)

1-3hit
  • Two-Sided LPC-Based Speckle Noise Removal for Laser Speech Detection Systems

    Yahui WANG  Wenxi ZHANG  Xinxin KONG  Yongbiao WANG  Hongxin ZHANG  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/03/17
      Vol:
    E104-D No:6
      Page(s):
    850-862

    Laser speech detection uses a non-contact Laser Doppler Vibrometry (LDV)-based acoustic sensor to obtain speech signals by precisely measuring voice-generated surface vibrations. Over long distances, however, the detected signal is very weak and full of speckle noise. To enhance the quality and intelligibility of the detected signal, we designed a two-sided Linear Prediction Coding (LPC)-based locator and interpolator to detect and replace speckle noise. We first studied the characteristics of speckle noise in detected signals and developed a binary-state statistical model for speckle noise generation. A two-sided LPC-based locator was then designed to locate the polluted samples, composed of an inverse decorrelator, nonlinear filter and threshold estimator. This greatly improves the detectability of speckle noise and avoids false/missed detection by improving the noise-to-signal-ratio (NSR). Finally, samples from both sides of the speckle noise were used to estimate the parameters of the interpolator and to code samples for replacing the polluted samples. Real-world speckle noise removal experiments and simulation-based comparative experiments were conducted and the results show that the proposed method is better able to locate speckle noise in laser detected speech and highly effective at replacing it.

  • An Approach Using Combination of Multiple Features through Sigmoid Function for Speech-Presence/Absence Discrimination

    Kun-Ching WANG  Chiun-Li CHIN  

     
    PAPER-Engineering Acoustics

      Vol:
    E94-A No:8
      Page(s):
    1630-1637

    In this paper, we present an approach of detecting speech presence for which the decision rule is based on a combination of multiple features using a sigmoid function. A minimum classification error (MCE) training is used to update the weights adjustment for the combination. The features, consisting of three parameters: the ratio of ZCR, the spectral energy, and spectral entropy, are combined linearly with weights derived from the sub-band domain. First, the Bark-scale wavelet decomposition (BSWD) is used to split the input speech into 24 critical sub-bands. Next, the feature parameters are derived from the selected frequency sub-band to form robust voice feature parameters. In order to discard the seriously corrupted frequency sub-band, a strategy of adaptive frequency sub-band extraction (AFSE) dependant on the sub-band SNR is then applied to only the frequency sub-band used. Finally, these three feature parameters, which only consider the useful sub-band, are combined through a sigmoid type function incorporating optimal weights based on MSE training to detect either a speech present frame or a speech absent frame. Experimental results show that the performance of the proposed algorithm is superior to the standard methods such as G.729B and AMR2.

  • Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

    Toru IMAI  Shoei SATO  Shinichi HOMMA  Kazuo ONOE  Akio KOBAYASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:8
      Page(s):
    1286-1291

    This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.