1-3hit |
This paper describes a text-independent speaker recognition method using predictive neural networks. For text-independent speaker recognition, an ergodic model which allows transitions to any other state, including selftransitions, is adopted as the speaker model and one predictive neural network is assigned to each state. The proposed method was compared to quantization distortion based methods, HMM based methods, and a discriminative neural network based method through text-independent speaker identification experiments on 24 female speakers. The proposed method gave the highest identification rate of 100.0%, and the effectiveness of predictive neural networks for representing speaker individuality was clarified.
Hiroaki HATTORI Satoshi NAKAMURA Kiyohiro SHIKANO Shigeki SAGAYAMA
This paper proposes a new speaker adaptation method using a speaker weighting technique for multiple reference speaker training of a hidden Markov model (HMM). The proposed method considers the similarities between an input speaker and multiple reference speakers, and use the similarities to control the influence of the reference speakers upon HMM. The evaluation experiments were carried out through the/b, d, g, m, n, N/phoneme recognition task using 8 speakers. Average recognition rates were 68.0%, 66.4%, and 65.6% respectively for three test sets which have different speech styles. These were 4.8%, 8.8%, and 10.5% higher than the rates of the spectrum mapping method, and also 1.6%, 6.7%, and 8.2% higher than the rates of the multiple reference speaker training, the supplemented HMM. The evaluation experiments clarified the effectiveness of the proposed method.
Hiroaki HATTORI Shigeki SAGAYAMA
This paper describes a new supervised speaker adaptation method based on vector field smoothing, for small size adaptation data. This method assumes that the correspondence of feature vectors between speakers can be viewed as a kind of smooth vector field, and interpolation and smoothing of the correspondence are introduced into the adaptation process for higher adaptation performance with small size data. The proposed adaptation method was applied to discrete HMM based speech recognition and evaluated in Japanese phoneme and phrase recognition experiments. Using 10 words as the adaptation data, the proposed method produced almost the same results as the conventional codebook mapping method with 25 words. These experiments clearly comfirmed the effectiveness of the proposed method.