1-1hit |
Yasuhisa FUJII Kazumasa YAMAMOTO Seiichi NAKAGAWA
This paper presents a novel method for improving the readability of automatic speech recognition (ASR) results for classroom lectures. Because speech in a classroom is spontaneous and contains many ill-formed utterances with various disfluencies, the ASR result should be edited to improve the readability before presenting it to users, by applying some operations such as removing disfluencies, determining sentence boundaries, inserting punctuation marks and repairing dropped words. Owing to the presence of many kinds of domain-dependent words and casual styles, even state-of-the-art recognizers can only achieve a 30-50% word error rate for speech in classroom lectures. Therefore, a method for improving the readability of ASR results is needed to make it robust to recognition errors. We can use multiple hypotheses instead of the single-best hypothesis as a method to achieve a robust response to recognition errors. However, if the multiple hypotheses are represented by a lattice (or a confusion network), it is difficult to utilize sentence-level knowledge, such as chunking and dependency parsing, which are imperative for determining the discourse structure and therefore imperative for improving readability. In this paper, we propose a novel algorithm that infers clean, readable transcripts from spontaneous multiple hypotheses represented by a confusion network while integrating sentence-level knowledge. Automatic and manual evaluations showed that using multiple hypotheses and sentence-level knowledge is effective to improve the readability of ASR results, while preserving the understandability.