1-6hit |
Traditional wavelet-based speech enhancement algorithms are ineffective in the presence of highly non-stationary noise because of the difficulties in the accurate estimation of the local noise spectrum. In this paper, a simple method of noise estimation employing the use of a voice activity detector is proposed. We can improve the output of a wavelet-based speech enhancement algorithm in the presence of random noise bursts according to the results of VAD decision. The noisy speech is first preprocessed using bark-scale wavelet packet decomposition ( BSWPD ) to convert a noisy signal into wavelet coefficients (WCs). It is found that the VAD using bark-scale spectral entropy, called as BS-Entropy, parameter is superior to other energy-based approach especially in variable noise-level. The wavelet coefficient threshold (WCT) of each subband is then temporally adjusted according to the result of VAD approach. In a speech-dominated frame, the speech is categorized into either a voiced frame or an unvoiced frame. A voiced frame possesses a strong tone-like spectrum in lower subbands, so that the WCs of lower-band must be reserved. On the contrary, the WCT tends to increase in lower-band if the speech is categorized as unvoiced. In a noise-dominated frame, the background noise can be almost completely removed by increasing the WCT. The objective and subjective experimental results are then used to evaluate the proposed system. The experiments show that this algorithm is valid on various noise conditions, especially for color noise and non-stationary noise conditions.
A novel long-term sub-band entropy (LT-SubEntropy) measure, which uses improved long-term spectral analysis and sub-band entropy, is proposed for voice activity detection (VAD). Based on the measure, we can accurately exploit the inherent nature of the formant structure on speech spectrogram (the well-known as voiceprint). Results show that the proposed VAD is superior to existing standard VAD methods at low SNR levels, especially at variable-level noise.
Conventional entropy measure is derived from full-band (range from 0 Hz to 4 kHz); however, it can not clearly describe the spectrum variability during voice-activity. Here we propose a novel concept of adaptive long-term sub-band entropy ( ALT-SubEnpy ) measure and combine it with a multi-thresholding scheme for voice activity detection. In detail, the ALT-SubEnpy measure developed with four part parameters of sub-entropy which uses different long-term spectral window length at each part. Consequently, the proposed ALT-SubEnpy -based algorithm recursively updates the four adaptive thresholds on each part. The proposed ALT-SubEnpy-based VAD method is shown to be an effective method while working at variable noise-level condition.
This study presents a fast adaptive algorithm for noise estimation in non-stationary environments. To make noise estimation adapt quickly to non-stationary noise environments, a robust entropy-based voice activity detection (VAD) is thus required. It is well-known that the entropy-based measure defined in spectral domain is very insensitive to the changing level of nose. To exploit the specific nature of straight lines existing on speech-only spectrogram, the proposed spectrum entropy measurement improved from spectrum entropy proposed by Shen et al. is further presented and is named band-splitting spectrum entropy (BSE). Consequently, the proposed recursive noise estimator including BSE-based VAD can update noise power spectrum accurately even if the noise-level quickly changes.
In this paper, we present an approach of detecting speech presence for which the decision rule is based on a combination of multiple features using a sigmoid function. A minimum classification error (MCE) training is used to update the weights adjustment for the combination. The features, consisting of three parameters: the ratio of ZCR, the spectral energy, and spectral entropy, are combined linearly with weights derived from the sub-band domain. First, the Bark-scale wavelet decomposition (BSWD) is used to split the input speech into 24 critical sub-bands. Next, the feature parameters are derived from the selected frequency sub-band to form robust voice feature parameters. In order to discard the seriously corrupted frequency sub-band, a strategy of adaptive frequency sub-band extraction (AFSE) dependant on the sub-band SNR is then applied to only the frequency sub-band used. Finally, these three feature parameters, which only consider the useful sub-band, are combined through a sigmoid type function incorporating optimal weights based on MSE training to detect either a speech present frame or a speech absent frame. Experimental results show that the performance of the proposed algorithm is superior to the standard methods such as G.729B and AMR2.
This letter introduces innovative VAD based on horizontal spectral entropy with long-span of time (HSELT) feature sets to improve mobile ASR performance in low signal-to-noise ratio (SNR) conditions. Since the signal characteristics of nonstationary noise change with time, we need long-term information of the noisy speech signal to define a more robust decision rule yielding high accuracy. We find that HSELT measures can horizontally enhance the transition between speech and non-speech segments. Based on this finding, we use the HSELT measures to achieve high accuracy for detecting speech signal form various stationary and nonstationary noises.