The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

Connectionist Approaches to Large Vocabulary Continuous Speech Recognition

Hidefumi SAWAI, Yasuhiro MINAMI, Masanori MIYATAKE, Alex WAIBEL, Kiyohiro SHIKANO

  • Full Text Views

    0

  • Cite this

Summary :

This paper describes recent progress in a connectionist large-vocabulary continuous speech recognition system integrating speech recognition and language processing. The speech recognition part consists of Large Phonemic Time-Delay Neural Networks (TDNNs) which can automatically spot all 24 Japanese phonemes (i.e., 18 consonants /b/, /d/, /g/, /p/, /t/, /k/, /m/, /n/, /N/, /s/, /sh/ ([]), /h/, /z/, /ch/ ([t]), /ts/, /r/, /w/, /y/([j]) and 5 vowels /a/, /i/, /u/, /e/, /o/ and a double consonant /Q/ or silence) by simply scanning among input speech without any specific segmentation techniques. On the other hand, the language processing part is made up of a predictive LR parser in which the LR parser is guided by the LR parsing table automatically generated from context-free grammar rules, and proceeds left-to-right without backtracking. Time alignment between the predicted phonemes and a sequence of the TDNN phoneme outputs is carried out by the DTW matching method. We call this 'hybrid' integrated recognition system the 'TDNN-LR' method. We report that large-vocabulary isolated word and continuous speech recognition using the TDNN-LR method provided excellent speaker-dependent recognition performance, where incremental training using a small number of training tokens is found to be very effective for adaptation of speaking rate. Furthermore, we report some new achievements as extensions of the TDNN-LR method: (1) two proposed NN architectures provide robust phoneme recognition performance on variations of speaking manner, (2) a speaker-adaptation technique can be realized using a NN mapping function between input and standard speakers and (3) new architectures proposed for speaker-independent recognition provide performance that nearly matches speaker-dependent recognition performance.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E74-A No.7 pp.1834-1844
Publication Date
1991/07/25
Publicized
Online ISSN
DOI
Type of Manuscript
Special Section PAPER (Special Issue on Continuous Speech Recognition and Understanding)
Category
Continuous Speech Recognition

Authors

Keyword