1-3hit |
JooHun LEE MyungJin BAE Souguil ANN
A fast pitch search algorithm using the skipping technique is proposed to reduce the computation time in CELP vocoder. Based on the characteristics of the correlation function of speech signal, the proposed algorithm skips over certain ranges in the full pitch search range in a simple way. Though the search range is reduced, high speech quality can be maintained since those lags having high correlation values are not skipped over and are used for search by closed-loop analysis. To improve the efficiency of the proposed method, we develop three variants of the skipping technique. The experimental results show that the proposed and the modified algorithm can reduce the computation time in the pitch search considerably, over 60% reduction compared with the traditional full search method.
Jinyoung KIM Joohun LEE Katsuhiko SHIRAI
In this paper, for real-time automatic image transform based lip-reading under illumination variations, an efficient (smaller feature data size) and robust (better recognition under different lighting conditions) method is proposed. Image transform based approach obtains a compressed representation of image pixel values of speaker's mouth and is reported to show superior lip-reading performance. However, this approach inevitably produces large feature vectors relevant to lip information to require much computation time for lip-reading even when principal component analysis (PCA) is applied. To reduce the necessary dimension of feature vectors, the proposed method folded the lip image based on its symmetry in a frame image. This method also compensates the unbalanced illumination between the left and the right lip areas. Additionally, to filter out the inter-frame time-domain spectral distortion of each pixel contaminated by illumination noise, our method adapted the hi-pass filtering on the variations of pixel values between consecutive frames. In the experimental results performed on database recorded at various lighting conditions, the proposed lip-folding or/and inter-frame filtering reduced much the necessary number of feature data, principal components in this work, and showed superior recognition rate compared to the conventional method.
Jinyoung KIM Joohun LEE Katsuhiko SHIRAI
In this paper, we propose a corpus-based lip-sync algorithm for natural face animation. For this purpose, we constructed a Korean audio-visual (AV) corpus. Based on this AV corpus, we propose a concatenation method of AV units, which is similar to a corpus-based text-to-speech system. For our AV corpus, lip-related parameters were extracted from every video-recorded facial shot which of speaker reads the given texts selected from newspapers. The spoken utterances were labeled with HTK and such prosodic information as duration, pitch and intensity was extracted as lip-sync parameters. Based on the constructed AV corpus, basic synthetic units are set by CVC-syllable units. For the best concatenation performance, based on the phonetic environment distance and the prosodic distance, the best path is estimated by a general Viterbi search algorithm. From the computer simulation results, we found that the information concerned with not only duration but also pitch and intensity is useful to enhance the lip-sync performance. And the reconstructed lip parameters have almost equal values to those of the original parameters.