Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge

Yasuhisa FUJII; Kazumasa YAMAMOTO; Seiichi NAKAGAWA

doi:10.1587/transinf.E95.D.1101

Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge

Yasuhisa FUJII, Kazumasa YAMAMOTO, Seiichi NAKAGAWA

Full Text Views

0

Cite this

Summary :

This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, determining sentence boundaries, inserting punctuation marks and repairing dropped words. Owing to the presence of many kinds of domain-dependent words and casual styles, even state-of-the-art recognizers can only achieve a 30-50% word error rate for speech in classroom lectures. Therefore, a method for improving the readability of ASR results is needed to make it robust to recognition errors. We can use multiple hypotheses instead of the single-best hypothesis as a method to achieve a robust response to recognition errors. However, if the multiple hypotheses are represented by a lattice (or a confusion network), it is difficult to utilize sentence-level knowledge, such as chunking and dependency parsing, which are imperative for determining the discourse structure and therefore imperative for improving readability. In this paper, we propose a novel algorithm that infers clean, readable transcripts from spontaneous multiple hypotheses represented by a confusion network while integrating sentence-level knowledge. Automatic and manual evaluations showed that using multiple hypotheses and sentence-level knowledge is effective to improve the readability of ASR results, while preserving the understandability.

Publication: IEICE TRANSACTIONS on Information Vol.E95-D No.4 pp.1101-1111

Publication Date: 2012/04/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E95.D.1101

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Yasuhisa FUJII, Kazumasa YAMAMOTO, Seiichi NAKAGAWA, "Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge" in IEICE TRANSACTIONS on Information, vol. E95-D, no. 4, pp. 1101-1111, April 2012, doi: 10.1587/transinf.E95.D.1101.
Abstract: This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, determining sentence boundaries, inserting punctuation marks and repairing dropped words. Owing to the presence of many kinds of domain-dependent words and casual styles, even state-of-the-art recognizers can only achieve a 30-50% word error rate for speech in classroom lectures. Therefore, a method for improving the readability of ASR results is needed to make it robust to recognition errors. We can use multiple hypotheses instead of the single-best hypothesis as a method to achieve a robust response to recognition errors. However, if the multiple hypotheses are represented by a lattice (or a confusion network), it is difficult to utilize sentence-level knowledge, such as chunking and dependency parsing, which are imperative for determining the discourse structure and therefore imperative for improving readability. In this paper, we propose a novel algorithm that infers clean, readable transcripts from spontaneous multiple hypotheses represented by a confusion network while integrating sentence-level knowledge. Automatic and manual evaluations showed that using multiple hypotheses and sentence-level knowledge is effective to improve the readability of ASR results, while preserving the understandability.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E95.D.1101/_p

Copy

@ARTICLE{e95-d_4_1101,
author={Yasuhisa FUJII, Kazumasa YAMAMOTO, Seiichi NAKAGAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge},
year={2012},
volume={E95-D},
number={4},
pages={1101-1111},
abstract={This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, determining sentence boundaries, inserting punctuation marks and repairing dropped words. Owing to the presence of many kinds of domain-dependent words and casual styles, even state-of-the-art recognizers can only achieve a 30-50% word error rate for speech in classroom lectures. Therefore, a method for improving the readability of ASR results is needed to make it robust to recognition errors. We can use multiple hypotheses instead of the single-best hypothesis as a method to achieve a robust response to recognition errors. However, if the multiple hypotheses are represented by a lattice (or a confusion network), it is difficult to utilize sentence-level knowledge, such as chunking and dependency parsing, which are imperative for determining the discourse structure and therefore imperative for improving readability. In this paper, we propose a novel algorithm that infers clean, readable transcripts from spontaneous multiple hypotheses represented by a confusion network while integrating sentence-level knowledge. Automatic and manual evaluations showed that using multiple hypotheses and sentence-level knowledge is effective to improve the readability of ASR results, while preserving the understandability.},
keywords={},
doi={10.1587/transinf.E95.D.1101},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - Improving the Readability of ASR Results for Lectures Using Multiple Hypotheses and Sentence-Level Knowledge
T2 - IEICE TRANSACTIONS on Information
SP - 1101
EP - 1111
AU - Yasuhisa FUJII
AU - Kazumasa YAMAMOTO
AU - Seiichi NAKAGAWA
PY - 2012
DO - 10.1587/transinf.E95.D.1101
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2012
AB - This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, determining sentence boundaries, inserting punctuation marks and repairing dropped words. Owing to the presence of many kinds of domain-dependent words and casual styles, even state-of-the-art recognizers can only achieve a 30-50% word error rate for speech in classroom lectures. Therefore, a method for improving the readability of ASR results is needed to make it robust to recognition errors. We can use multiple hypotheses instead of the single-best hypothesis as a method to achieve a robust response to recognition errors. However, if the multiple hypotheses are represented by a lattice (or a confusion network), it is difficult to utilize sentence-level knowledge, such as chunking and dependency parsing, which are imperative for determining the discourse structure and therefore imperative for improving readability. In this paper, we propose a novel algorithm that infers clean, readable transcripts from spontaneous multiple hypotheses represented by a confusion network while integrating sentence-level knowledge. Automatic and manual evaluations showed that using multiple hypotheses and sentence-level knowledge is effective to improve the readability of ASR results, while preserving the understandability.
ER -