IEICE global.ieice.org Site

Author Search Result

[Author] Ian R. LANE(2hit)

1-2hit

Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures
Ian R. LANE Tatsuya KAWAHARA

PAPER-Speech Recognition

Vol:
E89-D No:3
Page(s):
931-938
Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.
Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching
Ian R. LANE Tatsuya KAWAHARA Tomoko MATSUI Satoshi NAKAMURA

PAPER-Spoken Language Systems

Vol:
E88-D No:3
Page(s):
446-454
An efficient, scalable speech recognition architecture combining topic detection and topic-dependent language modeling is proposed for multi-domain spoken language systems. In the proposed approach, the inferred topic is automatically detected from the user's utterance, and speech recognition is then performed by applying an appropriate topic-dependent language model. This approach enables users to freely switch between domains while maintaining high recognition accuracy. As topic detection is performed on a single utterance, detection errors may occur and propagate through the system. To improve robustness, a hierarchical back-off mechanism is introduced where detailed topic models are applied when topic detection is confident and wider models that cover multiple topics are applied in cases of uncertainty. The performance of the proposed architecture is evaluated when combined with two topic detection methods: unigram likelihood and SVMs (Support Vector Machines). On the ATR Basic Travel Expression Corpus, both methods provide a significant reduction in WER (9.7% and 10.3%, respectively) compared to a single language model system. Furthermore, recognition accuracy is comparable to performing decoding with all topic-dependent models in parallel, while the required computational cost is much reduced.

Author Search Result

[Author] Ian R. LANE(2hit)

Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures

Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles