This paper describes Japanese spoken sentence recognition using hybrid language modeling, which combines the advantages of both syntactic and stochastic language models. As the baseline system, we adopted the HMM-LR speech recognition system, with which we have already achieved good performance for Japanese phrase recognition tasks. Several improvements have been made to this system aimed at handling continuously spoken sentences. The first improvement is HMM training with continuous utterances as well as word utterances. In previous implementations, HMMs were trained with only word utterances. Continuous utterances are included in the HMM training data because coarticulation effects are much stronger in continuous utterances. The second improvement is the development of a sentential grammar for Japanese. The sentential grammar was created by combining inter- and intra-phrase CFG grammars, which were developed separately. The third improvement is the incorporation of stochastic linguistic knowledge, which includes stochastic CFG and a bigram model of production rules. The system was evaluated using continuously spoken sentences from a conference registration task that included approximately 750 words. We attained a sentence accuracy of 83.9% in the speaker-dependent condition.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kenji KITA, Tsuyoshi MORIMOTO, Kazumi OHKURA, Shigeki SAGAYAMA, Yaneo YANO, "Spoken Sentence Recognition Based on HMM-LR with Hybrid Language Modeling" in IEICE TRANSACTIONS on Information,
vol. E77-D, no. 2, pp. 258-265, February 1994, doi: .
Abstract: This paper describes Japanese spoken sentence recognition using hybrid language modeling, which combines the advantages of both syntactic and stochastic language models. As the baseline system, we adopted the HMM-LR speech recognition system, with which we have already achieved good performance for Japanese phrase recognition tasks. Several improvements have been made to this system aimed at handling continuously spoken sentences. The first improvement is HMM training with continuous utterances as well as word utterances. In previous implementations, HMMs were trained with only word utterances. Continuous utterances are included in the HMM training data because coarticulation effects are much stronger in continuous utterances. The second improvement is the development of a sentential grammar for Japanese. The sentential grammar was created by combining inter- and intra-phrase CFG grammars, which were developed separately. The third improvement is the incorporation of stochastic linguistic knowledge, which includes stochastic CFG and a bigram model of production rules. The system was evaluated using continuously spoken sentences from a conference registration task that included approximately 750 words. We attained a sentence accuracy of 83.9% in the speaker-dependent condition.
URL: https://global.ieice.org/en_transactions/information/10.1587/e77-d_2_258/_p
Copy
@ARTICLE{e77-d_2_258,
author={Kenji KITA, Tsuyoshi MORIMOTO, Kazumi OHKURA, Shigeki SAGAYAMA, Yaneo YANO, },
journal={IEICE TRANSACTIONS on Information},
title={Spoken Sentence Recognition Based on HMM-LR with Hybrid Language Modeling},
year={1994},
volume={E77-D},
number={2},
pages={258-265},
abstract={This paper describes Japanese spoken sentence recognition using hybrid language modeling, which combines the advantages of both syntactic and stochastic language models. As the baseline system, we adopted the HMM-LR speech recognition system, with which we have already achieved good performance for Japanese phrase recognition tasks. Several improvements have been made to this system aimed at handling continuously spoken sentences. The first improvement is HMM training with continuous utterances as well as word utterances. In previous implementations, HMMs were trained with only word utterances. Continuous utterances are included in the HMM training data because coarticulation effects are much stronger in continuous utterances. The second improvement is the development of a sentential grammar for Japanese. The sentential grammar was created by combining inter- and intra-phrase CFG grammars, which were developed separately. The third improvement is the incorporation of stochastic linguistic knowledge, which includes stochastic CFG and a bigram model of production rules. The system was evaluated using continuously spoken sentences from a conference registration task that included approximately 750 words. We attained a sentence accuracy of 83.9% in the speaker-dependent condition.},
keywords={},
doi={},
ISSN={},
month={February},}
Copy
TY - JOUR
TI - Spoken Sentence Recognition Based on HMM-LR with Hybrid Language Modeling
T2 - IEICE TRANSACTIONS on Information
SP - 258
EP - 265
AU - Kenji KITA
AU - Tsuyoshi MORIMOTO
AU - Kazumi OHKURA
AU - Shigeki SAGAYAMA
AU - Yaneo YANO
PY - 1994
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E77-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 1994
AB - This paper describes Japanese spoken sentence recognition using hybrid language modeling, which combines the advantages of both syntactic and stochastic language models. As the baseline system, we adopted the HMM-LR speech recognition system, with which we have already achieved good performance for Japanese phrase recognition tasks. Several improvements have been made to this system aimed at handling continuously spoken sentences. The first improvement is HMM training with continuous utterances as well as word utterances. In previous implementations, HMMs were trained with only word utterances. Continuous utterances are included in the HMM training data because coarticulation effects are much stronger in continuous utterances. The second improvement is the development of a sentential grammar for Japanese. The sentential grammar was created by combining inter- and intra-phrase CFG grammars, which were developed separately. The third improvement is the incorporation of stochastic linguistic knowledge, which includes stochastic CFG and a bigram model of production rules. The system was evaluated using continuously spoken sentences from a conference registration task that included approximately 750 words. We attained a sentence accuracy of 83.9% in the speaker-dependent condition.
ER -