The search functionality is under construction.
The search functionality is under construction.

Incremental Language Modeling for Automatic Transcription of Broadcast News

Katsutoshi OHTSUKI, Long NGUYEN

  • Full Text Views

    0

  • Cite this

Summary :

In this paper, we address the task of incremental language modeling for automatic transcription of broadcast news speech. Daily broadcast news naturally contains new words that are not in the lexicon of the speech recognition system but are important for downstream applications such as information retrieval or machine translation. To recognize those new words, the lexicon and the language model of the speech recognition system need to be updated periodically. We propose a method of estimating a list of words to be added to the lexicon based on some time-series text data. The experimental results on the RT04 Broadcast News data and other TV audio data showed that this method provided an impressive and stable reduction in both out-of-vocabulary rates and speech recognition word error rates.

Publication
IEICE TRANSACTIONS on Information Vol.E90-D No.2 pp.526-532
Publication Date
2007/02/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e90-d.2.526
Type of Manuscript
PAPER
Category
Speech and Hearing

Authors

Keyword