1-7hit |
Jung-In LEE Jeung-Yoon CHOI Hong-Goo KANG
Refinement methods for landmark detection and extraction of articulator-free features for a knowledge-based speech recognition system are described. Sub-band energy difference profiles are used to detect landmarks, with additional parameters used to improve accuracy. For articulator-free feature extraction, duration, relative energy, and silence detection are additionally used to find [continuant] and [strident] features. Vowel, obstruent and sonorant consonant landmarks, and locations of voicing onsets and offsets are detected within a unified framework with 85% accuracy overall. Additionally, 75% and 79% of [continuant] and [strident] features, respectively, are detected from landmarks.
Jung-In LEE Jeung-Yoon CHOI Hong-Goo KANG
There have been steady demands for a speech segmentation method to handle various speech applications. Conventional segmentation algorithms show reliable performance but they require a sufficient training database. This letter proposes a manner class segmentation method based on the acoustic event and landmark detection used in the knowledge-based speech recognition system. Measurements of sub-band abruptness and additional parameters are used to detect the acoustic events. Candidates of manner classes are segmented from the acoustic events and determined based on the knowledge of acoustic phonetics and acoustic parameters. Manners of vowel/glide, nasal, fricative, stop burst, stop closure, and silence are segmented in this system. In total, 71% of manner classes are correctly segmented with 20-ms error boundaries.
Yang-Won JUNG Hong-Goo KANG Chungyong LEE Dae-Hee YOUN Changkyu CHOI Jaywoo KIM
In this paper, an adaptive microphone array system with a two-stage adaptation mode controller (AMC) is proposed for high-quality speech acquisition in real environments. The proposed system includes an adaptive array algorithm, a time-delay estimator and a newly proposed AMC. To ensure proper adaptation of the adaptive array algorithm, the proposed AMC uses not only temporal information, but also spatial information. The proposed AMC is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive array algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed algorithm is implemented as a real-time man-machine interface module of a home-agent robot. Simulation results show 13 dB SINR improvement with the speaker sitting 2 m distance from the home-agent robot. The speech recognition rate is also enhanced by 32% when compared to the single channel acquisition system.
Seung-Kyun RYU Hong-Goo KANG Sung-Kyo JUNG Dae-Hee YOUN
This paper proposes an algorithm to improve the performance of the noise power spectrum estimation using the minimum statistics (MS). The minimum statistics noise estimator (MSNE) that is most efficient for speech enhancement often underestimates noise power when the signal characteristics changes abruptly. The proposed algorithm improves the accuracy of noise estimation by removing harmonic components of the speech signal. Simulation results verify that the performance of the proposed algorithm is better than that of the conventional algorithm in terms of the segmental SNR (SegSNR) and the spectral distance (SD).
Sung-Kyo JUNG Hong-Goo KANG Dae-Hee YOUN
This letter presents the advantages of a cascaded algebraic codebook structure at relatively high bit-rates. The cascaded structure that consists of two stages provides flexible pulse combinations due to an additional gain term in the second stage. The perceptual quality of the cascaded structure can be further improved by using a gain re-estimation scheme. Experiments confirm that the cascaded structure has a big advantage in terms of quality and complexity as the bit-rate becomes higher.
Yong-Soo CHOI Hong-Goo KANG Jae-Ha YOO Il-Whan CHA Dae-Hee YOUN
This paper describes a new Vector Sum Excited Linear Prediction (VSELP) coder with very low complexity. The method, called regular pulse VSELP (RP-VSELP), is based on regular pulse basis vectors with mutually orthonormal property. In this Approach, a very efficient vector-sum codebook is constructed from a set of mutually orthonormal regular pulse bassis vectors and enables us to simplify the codebook search without additional degradation of synthesized speech compared with that of the conventional VSELP. The regular pulse basis vectors are explicitly orthonormalized by means of the Gram-Schmidt procedure. To enhance the speech quality of the RP-VSELP speech coder, perceptually weighted distortion measure between the input and the synthesized speech is utilized in an iterative closedloop training process of the regular pulse basis vectors. It is shown that speech quality is improved by the training process. Experimental results demonstrate that the proposed method produces the synthesized speech quality comparable to that of the VSELP scheme at the bit-rate of 4.8 Kbps.
Bong-Jin LEE Chi-Sang JUNG Jeung-Yoon CHOI Hong-Goo KANG
This letter describes the importance of transition regions, e.g. at phoneme boundaries, for automatic speaker recognition compared with using steady-state regions. Experimental results of automatic speaker identification tasks confirm that transition regions include the most speaker distinctive features. A possible reason for obtaining such results is described in view of articulation, in particular, the degree of freedom of articulators. These results are expected to provide useful information in designing an efficient automatic speaker recognition system.