1-20hit |
Tung-chin LEE Young-cheol PARK Dae-hee YOUN
In this paper, we propose a switchable linear prediction (LP)/warped linear prediction (WLP) hybrid scheme for the transform coded excitation (TCX) coder, which is adopted as a core codec in AMR-WB+ and USAC. The proposed algorithm selects either an LP or WLP filter on a per-frame basis. To provide a smooth transitions between LP and WLP frames, a window switching scheme is developed using sine and rectangular windows. In addition, a Gaussian Mixture Model (GMM)-based classification module is used to determine the prediction mode. Through a subjective listening test it was confirmed that the proposed LP/WLP switching scheme offers improved sound quality.
Yang-Won JUNG Hong-Goo KANG Chungyong LEE Dae-Hee YOUN Changkyu CHOI Jaywoo KIM
In this paper, an adaptive microphone array system with a two-stage adaptation mode controller (AMC) is proposed for high-quality speech acquisition in real environments. The proposed system includes an adaptive array algorithm, a time-delay estimator and a newly proposed AMC. To ensure proper adaptation of the adaptive array algorithm, the proposed AMC uses not only temporal information, but also spatial information. The proposed AMC is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive array algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed algorithm is implemented as a real-time man-machine interface module of a home-agent robot. Simulation results show 13 dB SINR improvement with the speaker sitting 2 m distance from the home-agent robot. The speech recognition rate is also enhanced by 32% when compared to the single channel acquisition system.
Jeong-Hyeon YUN Young-Cheol PARK Dae-Hee YOUN Il-Whan CHA
An efficient active noise control algorithm based on the lattice-transversal joint (LTJ) filter structure is presented, and applied to the active control of broadband noise in a 3-dimensional enclosure. The presented algorithm implements the filtered-x LMS within the LTJ structure obtained by cascading the lattice and transversal structures. Simulation results show that the LTJ-based noise control algorithm has fast convergence speed that is comparable to the lattice-based algorithm while its computational complexity is less demanding.
Tacksung CHOI Young-Cheol PARK Dae-Hee YOUN
Development of an artificial reverberator for low-memory requirements is an issue of importance in applications such as mobile multimedia devices. One possibility is to use an All-Pass Filter (APF), which is embedded in the feedback loop of the comb filter network. In this paper, we propose a reverberator employing time-varying APFs to increase the reverberation performance. By changing the gain of the APF, we can increase the number of frequency peaks perceptually. Thus, the resulting reverberation sounds much more natural, even with less memory, than the conventional approach. In this paper, we perform theoretical and perceptual analyses of artificial reverberators employing time-varying APF. Through the analyses, we derive the degree of phase variation of the APF that is perceptually acceptable. Based on the analyses, we propose a method of designing artificial reverberators associated with the time-varying APFs. Through subjective tests, it is shown that the proposed method is capable of providing perceptually comparable sound quality to the conventional methods even though it uses less memory.
Yangseok JEONG Heungryeol YOU Dae-Hee YOUN Chungyong LEE
In positioning systems NLOS (Non-Line of Sight) errors always cause remarkable positive bias and directly increase range measurement errors. In this paper, a new method is proposed to calibrate NLOS errors in positioning systems by using the relationship between mean excess delay and delay spread measured at a mobile station. The computer simulations showed that the proposed calibration technique effectively reduces the positioning error caused by urban NLOS environment.
Jae-seong LEE Young-cheol PARK Dae-hee YOUN Kyung-ok KANG
Although the AMR-WB+ coder provides excellent quality for speech signal, its coding model for music signals is not as optimal as the HE-AAC v2. The main causes of the poor quality of the AMR-WB+ TCX are the non-critical sampling and block artifacts. The new TCX windowing scheme proposed in this paper uses an MDCT with a 50% frame overlap, so that the problems of non-critical sampling and blocking artifacts are significantly mitigated. Due to long overlaps, the proposed scheme involves an additional codec delay. It is, however, moderate for audio services. The results of objective and subjective tests indicate that the proposed scheme achieves noticeable quality improvements for music signals over the previous TCX schemes.
Seung-Kyun RYU Hong-Goo KANG Sung-Kyo JUNG Dae-Hee YOUN
This paper proposes an algorithm to improve the performance of the noise power spectrum estimation using the minimum statistics (MS). The minimum statistics noise estimator (MSNE) that is most efficient for speech enhancement often underestimates noise power when the signal characteristics changes abruptly. The proposed algorithm improves the accuracy of noise estimation by removing harmonic components of the speech signal. Simulation results verify that the performance of the proposed algorithm is better than that of the conventional algorithm in terms of the segmental SNR (SegSNR) and the spectral distance (SD).
Sang-Wook PARK Seung-Kyun RYU Dae-Hee YOUN
A new objective speech quality measure, Bark Coherence Function is presented. The Coherence Function was used for evaluating the non-linear distortion of low-to-medium rate speech coders. However, it is not well suited for quality estimation in modern speech transmission, especially, CDMA mobile communication system. In the proposed method, Coherence Function is newly defined in psycho-acoustic domain as the cognition module of perceptual speech quality measure and evaluates the perceptual non-linear distortion of mobile system. The experimental results showed that the proposed method has good performance over CDMA PCS and digital cellular system.
Tae-Young YANG Chungyong LEE Dae-Hee YOUN
A duration modeling technique is proposed for the HMM based connected digit recognizer. The proposed duration modeling technique uses a cumulative duration probability. The cumulative duration probability is defined as the partial sum of the duration probabilities which can be estimated from the training speech data. Two approaches of using it are presented. First, the cumulative duration probability is used as a weighting factor to the state transition probability of HMM. Second, it replaces the conventional state transition probability. In both approaches, the cumulative duration probability is combined directly to the Viterbi decoding procedure. A modified Viterbi decoding procedure is also presented. One of the advantages of the proposed duration modeling technique is that the cumulative duration probability rules the transitions of states and words at each frame. Therefore, an additional post-procedure is not required. The proposed technique was examined by recognition experiments on Korean connected digit. Experimental results showed that two approach achieved almost same performances and that the average recognition accuracy was enhanced from 83.60% to 93.12%.
Ki-Seung LEE Won DOH Dae-Hee YOUN
In this paper, a new voice personality transformation algorithm which uses the vocal tract characteristics and pitch period as feature parameters is proposed. The vocal tract transfer function is divided into time-invariant and time-varying parts. Conversion rules for the time-varying part are constructed by the classified-linear transformation matrix based on soft-clustering techniques for LPC cepstrum expressed in KL (Karhunen-Loève) coefficients. An excitation signal containing prosodic information is transformed by average pitch ratio. In order to improve the naturalness, transformation on the excitation signal is separately applied to voiced and unvoiced bands to preserve the overall spectral structure. Objective tests show that the distance between the LPC cepstrum of a target speaker and that of the speech synthesized using the proposed method is reduced by about 70% compared with the distance between the target speaker's LPC cepstrum and the source speaker's. Also, subjective listening tests show that 60-70% of listeners identify the transformed speech as the target speaker's.
Joonsung LEE Changheon OH Chungyong LEE Dae-Hee YOUN
A new beamforming method based on simplex downhill optimaization process has been presented for the reverse link CDMA systems. The proposed system performs code-filtering at each antenna for each user. The new beamforming method gives lower computations and faster convergence properties than existing algorithms. The simulation results show that the proposed algorithm has a better BER performance in the case of the time-varing channel.
Sung-Kyo JUNG Hong-Goo KANG Dae-Hee YOUN
This letter presents the advantages of a cascaded algebraic codebook structure at relatively high bit-rates. The cascaded structure that consists of two stages provides flexible pulse combinations due to an additional gain term in the second stage. The perceptual quality of the cascaded structure can be further improved by using a gain re-estimation scheme. Experiments confirm that the cascaded structure has a big advantage in terms of quality and complexity as the bit-rate becomes higher.
Jeong-Pyo HAM Tae-Young YANG Chungyong LEE Dae-Hee YOUN
In this letter, we propose a grammatical structure of the finite state network (FSN) for the recognition of Korean price sentences. It is implemented by arranging the nodes and the arcs of the FSN. Two kinds of grammatical structure are presented. Both are designed according to the grammar constraints of Korean price sentences. The grammar constraints of Korean price sentences are similar to those of English price sentences; the unit is placed after the digit; several digits form a basic group; the basic group appears recursively followed by meta-units, etc. Speaker-independent recognition experiments were conducted, and the results of the FSN's with proposed grammatical structures were compared with those of the FSN without grammatical structure.
Tacksung CHOI Sunkuk MOON Young-cheol PARK Dae-hee YOUN Seokpil LEE
In this paper, we propose a new feature selection algorithm for multi-class classification. The proposed algorithm is based on Gaussian mixture models (GMMs) of the features, and it uses the distance between the two least separable classes as a metric for feature selection. The proposed system was tested with a support vector machine (SVM) for multi-class classification of music. Results show that the proposed feature selection scheme is superior to conventional schemes.
Tae-Young YANG Chungyong LEE Dae-Hee YOUN
A speaker adaptation technique that maximizes the observation probability of an input speech is proposed. It is applied to semi-continuous hidden Markov model (SCHMM) speech recognizers. The proposed algorithm adapts the mean µ and the covariance Σ iteratively by the gradient search technique so that the features of the adaptation speech data could achieve maximum observation probabilities. The mixture coefficients and the state transition probabilities are adapted by the model interpolation scheme. The main advantage of this scheme is that the means and the variances, which are common to all states in SCHMM, are adapted independently from the other parameters of SCHMM. It allows fast and precise adaptation especially when there is a large acoustic mismatch between the reference model and a new speaker. Also, it is possible that this scheme could be adopted to other areas which use codebook. The proposed adaptation algorithm was evaluated by a male speaker-dependent, a female speaker-dependent, and a speaker-independent recognizers. The experimental results on the isolated word recognition showed that the proposed adaptation algorithm achieved 46.03% average enhancement in the male speaker-dependent recognizer, 52.18% in the female speaker-dependent recognizer, and 9.84% in the speaker-independent recognizer.
Yong-Soo CHOI Hong-Goo KANG Jae-Ha YOO Il-Whan CHA Dae-Hee YOUN
This paper describes a new Vector Sum Excited Linear Prediction (VSELP) coder with very low complexity. The method, called regular pulse VSELP (RP-VSELP), is based on regular pulse basis vectors with mutually orthonormal property. In this Approach, a very efficient vector-sum codebook is constructed from a set of mutually orthonormal regular pulse bassis vectors and enables us to simplify the codebook search without additional degradation of synthesized speech compared with that of the conventional VSELP. The regular pulse basis vectors are explicitly orthonormalized by means of the Gram-Schmidt procedure. To enhance the speech quality of the RP-VSELP speech coder, perceptually weighted distortion measure between the input and the synthesized speech is utilized in an iterative closedloop training process of the regular pulse basis vectors. It is shown that speech quality is improved by the training process. Experimental results demonstrate that the proposed method produces the synthesized speech quality comparable to that of the VSELP scheme at the bit-rate of 4.8 Kbps.
Jae-woong JEONG Young-cheol PARK Dae-hee YOUN Seok-Pil LEE
In this paper, we propose a robust room inverse filtering algorithm for speech dereverberation based on a kurtosis maximization. The proposed algorithm utilizes a new normalized kurtosis function that nonlinearly maps the input kurtosis onto a finite range from zero to one, which results in a kurtosis warping. Due to the kurtosis warping, the proposed algorithm provides more stable convergence and, in turn, better performance than the conventional algorithm. Experimental results are presented to confirm the robustness of the proposed algorithm.
Jae-Seong LEE Chang-Joon LEE Young-Cheol PARK Dae-Hee YOUN
This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.
Jae-woong JEONG Young-cheol PARK Dae-hee YOUN
This paper presents an approximated virtual source imaging system based on crosstalk cancellation with a pair of closely spaced loudspeakers. Utilizing the frequency-dependent relative importance of sound localization cues, the proposed system provides separate approximations for the low- and high-frequency bands. Experimental results show that the system provides good approximations within ±55° in the stereo dipole setup with natural sound quality.
Tung-chin LEE Young-cheol PARK Dae-hee YOUN
This paper proposes a method of improving the performance of blind reverberation time (RT) estimation in noisy environments. RT estimation is conducted using a maximum likelihood (ML) method based on the autocorrelation function of the linear predictive residual signal. To reduce the effect of environmental noise, a noise reduction technique is applied to the noisy speech signal. In addition, a frequency coefficient selection is performed to eliminate signal components with low signal-to-noise ratio (SNR). Experimental results confirm that the proposed method improves the accuracy of RT measures, particularly when the speech signal is corrupted by a colored noise with a narrow bandwidth.