This paper describes two Japanese continuous speech recognition systems (system-1 and system-2) based on phoneme-based HMMs and a two-level grammar approach. Two grammars are an intra-phrase transition network grammar for phrase recognition, and an inter-phrase dependency grammar for sentence recognition. A joint score, combining acoustic likelihood and linguistic certainty factors derived from phonemebased HMMs and dependency rules, is maximized to obtain the best sentence recognition results. System-1 is tuned for sentences uttered phrase-by-phrase and system-2 is tuned for sentence utterances, to make the amount of computation practical. In system-1, two efficient parsing algorithms are used for each grammar. They are a bi-directional network parser and a breadth-first dependency parser. With the phrase-network parser, input phrase utterances are parsed bi-directionally both left-to-right and right-to-left, and optimal Viterbi paths are found along which the accumulated phonetic likelihood is maximized. The dependency parser utilizes efficient breadth-first search and beam search algorithms. For system-2, we have extended the dependency analysis algorithm for sentence utterances, using a technique for detecting most-likely multi-phrase candidates based on the Viterbi phrase alignment. Where the perplexity of the phrase syntax is 40, system-1 and system-2 increase phrase recognition performance in the sentence by approximately 6% and 14%, showing the effectiveness of semantic dependency analysis.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Sho-ichi MATSUNAGA, Shigeru HOMMA, Shigeki SAGAYAMA, Sadaoki FURUI, "Continuous Speech Recognition Using a Dependency Grammar and Phoneme-Based HMMs" in IEICE TRANSACTIONS on Fundamentals,
vol. E74-A, no. 7, pp. 1826-1833, July 1991, doi: .
Abstract: This paper describes two Japanese continuous speech recognition systems (system-1 and system-2) based on phoneme-based HMMs and a two-level grammar approach. Two grammars are an intra-phrase transition network grammar for phrase recognition, and an inter-phrase dependency grammar for sentence recognition. A joint score, combining acoustic likelihood and linguistic certainty factors derived from phonemebased HMMs and dependency rules, is maximized to obtain the best sentence recognition results. System-1 is tuned for sentences uttered phrase-by-phrase and system-2 is tuned for sentence utterances, to make the amount of computation practical. In system-1, two efficient parsing algorithms are used for each grammar. They are a bi-directional network parser and a breadth-first dependency parser. With the phrase-network parser, input phrase utterances are parsed bi-directionally both left-to-right and right-to-left, and optimal Viterbi paths are found along which the accumulated phonetic likelihood is maximized. The dependency parser utilizes efficient breadth-first search and beam search algorithms. For system-2, we have extended the dependency analysis algorithm for sentence utterances, using a technique for detecting most-likely multi-phrase candidates based on the Viterbi phrase alignment. Where the perplexity of the phrase syntax is 40, system-1 and system-2 increase phrase recognition performance in the sentence by approximately 6% and 14%, showing the effectiveness of semantic dependency analysis.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e74-a_7_1826/_p
Copy
@ARTICLE{e74-a_7_1826,
author={Sho-ichi MATSUNAGA, Shigeru HOMMA, Shigeki SAGAYAMA, Sadaoki FURUI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Continuous Speech Recognition Using a Dependency Grammar and Phoneme-Based HMMs},
year={1991},
volume={E74-A},
number={7},
pages={1826-1833},
abstract={This paper describes two Japanese continuous speech recognition systems (system-1 and system-2) based on phoneme-based HMMs and a two-level grammar approach. Two grammars are an intra-phrase transition network grammar for phrase recognition, and an inter-phrase dependency grammar for sentence recognition. A joint score, combining acoustic likelihood and linguistic certainty factors derived from phonemebased HMMs and dependency rules, is maximized to obtain the best sentence recognition results. System-1 is tuned for sentences uttered phrase-by-phrase and system-2 is tuned for sentence utterances, to make the amount of computation practical. In system-1, two efficient parsing algorithms are used for each grammar. They are a bi-directional network parser and a breadth-first dependency parser. With the phrase-network parser, input phrase utterances are parsed bi-directionally both left-to-right and right-to-left, and optimal Viterbi paths are found along which the accumulated phonetic likelihood is maximized. The dependency parser utilizes efficient breadth-first search and beam search algorithms. For system-2, we have extended the dependency analysis algorithm for sentence utterances, using a technique for detecting most-likely multi-phrase candidates based on the Viterbi phrase alignment. Where the perplexity of the phrase syntax is 40, system-1 and system-2 increase phrase recognition performance in the sentence by approximately 6% and 14%, showing the effectiveness of semantic dependency analysis.},
keywords={},
doi={},
ISSN={},
month={July},}
Copy
TY - JOUR
TI - Continuous Speech Recognition Using a Dependency Grammar and Phoneme-Based HMMs
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1826
EP - 1833
AU - Sho-ichi MATSUNAGA
AU - Shigeru HOMMA
AU - Shigeki SAGAYAMA
AU - Sadaoki FURUI
PY - 1991
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E74-A
IS - 7
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - July 1991
AB - This paper describes two Japanese continuous speech recognition systems (system-1 and system-2) based on phoneme-based HMMs and a two-level grammar approach. Two grammars are an intra-phrase transition network grammar for phrase recognition, and an inter-phrase dependency grammar for sentence recognition. A joint score, combining acoustic likelihood and linguistic certainty factors derived from phonemebased HMMs and dependency rules, is maximized to obtain the best sentence recognition results. System-1 is tuned for sentences uttered phrase-by-phrase and system-2 is tuned for sentence utterances, to make the amount of computation practical. In system-1, two efficient parsing algorithms are used for each grammar. They are a bi-directional network parser and a breadth-first dependency parser. With the phrase-network parser, input phrase utterances are parsed bi-directionally both left-to-right and right-to-left, and optimal Viterbi paths are found along which the accumulated phonetic likelihood is maximized. The dependency parser utilizes efficient breadth-first search and beam search algorithms. For system-2, we have extended the dependency analysis algorithm for sentence utterances, using a technique for detecting most-likely multi-phrase candidates based on the Viterbi phrase alignment. Where the perplexity of the phrase syntax is 40, system-1 and system-2 increase phrase recognition performance in the sentence by approximately 6% and 14%, showing the effectiveness of semantic dependency analysis.
ER -