Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Ian R. LANE, Tatsuya KAWAHARA, "Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures" in IEICE TRANSACTIONS on Information,
vol. E89-D, no. 3, pp. 931-938, March 2006, doi: 10.1093/ietisy/e89-d.3.931.
Abstract: Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.3.931/_p
Copy
@ARTICLE{e89-d_3_931,
author={Ian R. LANE, Tatsuya KAWAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures},
year={2006},
volume={E89-D},
number={3},
pages={931-938},
abstract={Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.},
keywords={},
doi={10.1093/ietisy/e89-d.3.931},
ISSN={1745-1361},
month={March},}
Copy
TY - JOUR
TI - Verification of Speech Recognition Results Incorporating In-domain Confidence and Discourse Coherence Measures
T2 - IEICE TRANSACTIONS on Information
SP - 931
EP - 938
AU - Ian R. LANE
AU - Tatsuya KAWAHARA
PY - 2006
DO - 10.1093/ietisy/e89-d.3.931
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2006
AB - Conventional confidence measures for assessing the reliability of ASR (automatic speech recognition) output are typically derived from "low-level" information which is obtained during speech recognition decoding. In contrast to these approaches, we propose a novel utterance verification framework which incorporates "high-level" knowledge sources. Specifically, we investigate two application-independent measures: in-domain confidence, the degree of match between the input utterance and the application domain of the back-end system, and discourse coherence, the consistency between consecutive utterances in a dialogue session. A joint confidence score is generated by combining these two measures with an orthodox measure based on GPP (generalized posterior probability). The proposed framework was evaluated on an utterance verification task for spontaneous dialogue performed via a (English/Japanese) speech-to-speech translation system. Incorporating the two proposed measures significantly improved utterance verification accuracy compared to using GPP alone, realizing reductions in CER (confidence error-rate) of 11.4% and 8.1% for the English and Japanese sides, respectively. When negligible ASR errors (that do not affect translation) were ignored, further improvement was achieved for the English side, realizing a reduction in CER of up to 14.6% compared to the GPP case.
ER -