The search functionality is under construction.
The search functionality is under construction.

Recognition of Connected Digit Speech in Japanese Collected over the Telephone Network

Hisashi KAWAI, Tohru SHIMIZU, Norio HIGUCHI

  • Full Text Views

    0

  • Cite this

Summary :

This paper describes experimental results on whole word HMM-based speech recognition of connected digits in Japanese with special focus on the training data size and the "sheep and goats" problem. The training data comprises 757000 digits uttered by 2000 speakers, while the testing data comprises 399000 digits uttered by 1700 speakers. The best word error rate for unknown length strings was 1.64% obtained using context dependent HMMs. The word error rate was measured for various subsets of the training data reduced both in the number of speakers (s) and the number of utterances per speakers (u). As a result, an empirical formula of s[{min(0.62s0.75, u)}0.74 + {max(0, u-0.62s0.75)}0.27] = D(Ew) was developed, where Ew and D(Ew) designate word error rate and effective data size, respectively. Analyses were conducted on several aspects of the low performance speakers accounting for the major part of recognition errors. Attempts were also made to improve their recognition performance. It was found that 33% of the low performance speakers are improved to the normal level by speaker clustering centered around each low performance speaker.

Publication
IEICE TRANSACTIONS on Information Vol.E84-D No.3 pp.374-383
Publication Date
2001/03/01
Publicized
Online ISSN
DOI
Type of Manuscript
PAPER
Category
Speech and Hearing

Authors

Keyword