1-3hit |
Chung-Chien HSU Kah-Meng CHEONG Tai-Shih CHI Yu TSAO
This paper proposes a voice activity detection (VAD) algorithm based on an energy related feature of the frequency modulation of harmonics. A multi-resolution spectro-temporal analysis framework, which was developed to extract texture features of the audio signal from its Fourier spectrogram, is used to extract frequency modulation features of the speech signal. The proposed algorithm labels the voice active segments of the speech signal by comparing the energy related feature of the frequency modulation of harmonics with a threshold. Then, the proposed VAD is implemented on one of Texas Instruments (TI) digital signal processor (DSP) platforms for real-time operation. Simulations conducted on the DSP platform demonstrate the proposed VAD performs significantly better than three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, in non-stationary noise in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.
Yu TSAO Ting-Yao HU Sakriani SAKTI Satoshi NAKAMURA Lin-shan LEE
This study proposes a variable selection linear regression (VSLR) adaptation framework to improve the accuracy of automatic speech recognition (ASR) with only limited and unlabeled adaptation data. The proposed framework can be divided into three phases. The first phase prepares multiple variable subsets by applying a ranking filter to the original regression variable set. The second phase determines the best variable subset based on a pre-determined performance evaluation criterion and computes a linear regression (LR) mapping function based on the determined subset. The third phase performs adaptation in either model or feature spaces. The three phases can select the optimal components and remove redundancies in the LR mapping function effectively and thus enable VSLR to provide satisfactory adaptation performance even with a very limited number of adaptation statistics. We formulate model space VSLR and feature space VSLR by integrating the VS techniques into the conventional LR adaptation systems. Experimental results on the Aurora-4 task show that model space VSLR and feature space VSLR, respectively, outperform standard maximum likelihood linear regression (MLLR) and feature space MLLR (fMLLR) and their extensions, with notable word error rate (WER) reductions in a per-utterance unsupervised adaptation manner.
Jin LI-YOU Ying-Ren CHIEN Yu TSAO
Determining an effective way to reduce computation complexity is an essential task for adaptive echo cancellation applications. Recently, a family of partial update (PU) adaptive algorithms has been proposed to effectively reduce computational complexity. However, because a PU algorithm updates only a portion of the weights of the adaptive filters, the rate of convergence is reduced. To address this issue, this paper proposes an enhanced switching-based variable step-size (ES-VSS) approach to the M-max PU least mean square (LMS) algorithm. The step-size is determined by the correlation between the error signals and their noise-free versions. Noise-free error signals are approximated according to the level of convergence achieved during the adaptation process. The approximation of the noise-free error signals switches among four modes, such that the resulting step-size is as close to its optimal value as possible. Simulation results show that when only a half of all taps are updated in a single iteration, the proposed method significantly enhances the convergence rate of the M-max PU LMS algorithm.