Efficient Beam Pruning for Speech Recognition with a Reward Considering the Potential to Reach Various Words on a Lexical Tree

Tsuneo KATO; Kengo FUJITA; Nobuyuki NISHIZAWA

doi:10.1587/transinf.E94.D.1253

Efficient Beam Pruning for Speech Recognition with a Reward Considering the Potential to Reach Various Words on a Lexical Tree

Tsuneo KATO, Kengo FUJITA, Nobuyuki NISHIZAWA

Full Text Views

0

Cite this

Summary :

This paper presents efficient frame-synchronous beam pruning for HMM-based automatic speech recognition. In the conventional beam pruning, a few hypotheses that have greater potential to reach various words on a lexical tree are likely to be pruned out by a number of hypotheses that have limited potential, since all hypotheses are treated equally without considering this potential. To make the beam pruning less restrictive for hypotheses with greater potential and vice versa, the proposed method adds to the likelihood of each hypothesis a tentative reward as a monotonically increasing function of the number of reachable words from the HMM state where the hypothesis stays in a lexical tree. The reward is designed not to collapse the ASR probabilistic framework. The proposed method reduced 84% of the processing time for a grammar-based 10k-word short sentence recognition task. For a language-model-based dictation task, it also resulted in an additional 23% reduction in processing time from the beam pruning with the language model look-ahead technique.

Publication: IEICE TRANSACTIONS on Information Vol.E94-D No.6 pp.1253-1259

Publication Date: 2011/06/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E94.D.1253

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Tsuneo KATO, Kengo FUJITA, Nobuyuki NISHIZAWA, "Efficient Beam Pruning for Speech Recognition with a Reward Considering the Potential to Reach Various Words on a Lexical Tree" in IEICE TRANSACTIONS on Information, vol. E94-D, no. 6, pp. 1253-1259, June 2011, doi: 10.1587/transinf.E94.D.1253.
Abstract: This paper presents efficient frame-synchronous beam pruning for HMM-based automatic speech recognition. In the conventional beam pruning, a few hypotheses that have greater potential to reach various words on a lexical tree are likely to be pruned out by a number of hypotheses that have limited potential, since all hypotheses are treated equally without considering this potential. To make the beam pruning less restrictive for hypotheses with greater potential and vice versa, the proposed method adds to the likelihood of each hypothesis a tentative reward as a monotonically increasing function of the number of reachable words from the HMM state where the hypothesis stays in a lexical tree. The reward is designed not to collapse the ASR probabilistic framework. The proposed method reduced 84% of the processing time for a grammar-based 10k-word short sentence recognition task. For a language-model-based dictation task, it also resulted in an additional 23% reduction in processing time from the beam pruning with the language model look-ahead technique.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E94.D.1253/_p

Copy

@ARTICLE{e94-d_6_1253,
author={Tsuneo KATO, Kengo FUJITA, Nobuyuki NISHIZAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Efficient Beam Pruning for Speech Recognition with a Reward Considering the Potential to Reach Various Words on a Lexical Tree},
year={2011},
volume={E94-D},
number={6},
pages={1253-1259},
abstract={This paper presents efficient frame-synchronous beam pruning for HMM-based automatic speech recognition. In the conventional beam pruning, a few hypotheses that have greater potential to reach various words on a lexical tree are likely to be pruned out by a number of hypotheses that have limited potential, since all hypotheses are treated equally without considering this potential. To make the beam pruning less restrictive for hypotheses with greater potential and vice versa, the proposed method adds to the likelihood of each hypothesis a tentative reward as a monotonically increasing function of the number of reachable words from the HMM state where the hypothesis stays in a lexical tree. The reward is designed not to collapse the ASR probabilistic framework. The proposed method reduced 84% of the processing time for a grammar-based 10k-word short sentence recognition task. For a language-model-based dictation task, it also resulted in an additional 23% reduction in processing time from the beam pruning with the language model look-ahead technique.},
keywords={},
doi={10.1587/transinf.E94.D.1253},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Efficient Beam Pruning for Speech Recognition with a Reward Considering the Potential to Reach Various Words on a Lexical Tree
T2 - IEICE TRANSACTIONS on Information
SP - 1253
EP - 1259
AU - Tsuneo KATO
AU - Kengo FUJITA
AU - Nobuyuki NISHIZAWA
PY - 2011
DO - 10.1587/transinf.E94.D.1253
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E94-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2011
AB - This paper presents efficient frame-synchronous beam pruning for HMM-based automatic speech recognition. In the conventional beam pruning, a few hypotheses that have greater potential to reach various words on a lexical tree are likely to be pruned out by a number of hypotheses that have limited potential, since all hypotheses are treated equally without considering this potential. To make the beam pruning less restrictive for hypotheses with greater potential and vice versa, the proposed method adds to the likelihood of each hypothesis a tentative reward as a monotonically increasing function of the number of reachable words from the HMM state where the hypothesis stays in a lexical tree. The reward is designed not to collapse the ASR probabilistic framework. The proposed method reduced 84% of the processing time for a grammar-based 10k-word short sentence recognition task. For a language-model-based dictation task, it also resulted in an additional 23% reduction in processing time from the beam pruning with the language model look-ahead technique.
ER -