A 168-mW 2.4<cd0215f.gif>-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI

Guangji HE; Takanobu SUGAHARA; Yuki MIYAMOTO; Shintaro IZUMI; Hiroshi KAWAGUCHI; Masahiko YOSHIMOTO

doi:10.1587/transele.E96.C.444

A 168-mW 2.4-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI

Guangji HE, Takanobu SUGAHARA, Yuki MIYAMOTO, Shintaro IZUMI, Hiroshi KAWAGUCHI, Masahiko YOSHIMOTO

Full Text Views

0

Cite this

Summary :

This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40 nm CMOS technology, occupies 1.77 mm2.18 mm containing 2.52 M transistors for logic and 4.29 Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3 MHz), 48.5% power consumption reduction (74.14 mW) for 60 k-Word real-time continuous speech recognition compared to the previous work while 30% of the area is saved with recognition accuracy of 90.9%. This chip can maximally process 2.4faster than real-time at 200 MHz and 1.1 V with power consumption of 168 mW. By increasing the beam width, better recognition accuracy (91.45%) can be achieved. In that case, the power consumption for real-time processing is increased to 97.4 mW and the max-performance is decreased to 2.08because of the increased computation workload.

Publication: IEICE TRANSACTIONS on Electronics Vol.E96-C No.4 pp.444-453

Publication Date: 2013/04/01

Publicized

Online ISSN: 1745-1353

DOI: 10.1587/transele.E96.C.444

Type of Manuscript: Special Section PAPER (Special Section on Solid-State Circuit Design—Architecture, Circuit, Device and Design Methodology)

Category

Cite this

Copy

Guangji HE, Takanobu SUGAHARA, Yuki MIYAMOTO, Shintaro IZUMI, Hiroshi KAWAGUCHI, Masahiko YOSHIMOTO, "A 168-mW 2.4-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI" in IEICE TRANSACTIONS on Electronics, vol. E96-C, no. 4, pp. 444-453, April 2013, doi: 10.1587/transele.E96.C.444.
Abstract: This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40 nm CMOS technology, occupies 1.77 mm2.18 mm containing 2.52 M transistors for logic and 4.29 Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3 MHz), 48.5% power consumption reduction (74.14 mW) for 60 k-Word real-time continuous speech recognition compared to the previous work while 30% of the area is saved with recognition accuracy of 90.9%. This chip can maximally process 2.4faster than real-time at 200 MHz and 1.1 V with power consumption of 168 mW. By increasing the beam width, better recognition accuracy (91.45%) can be achieved. In that case, the power consumption for real-time processing is increased to 97.4 mW and the max-performance is decreased to 2.08because of the increased computation workload.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/transele.E96.C.444/_p

Copy

@ARTICLE{e96-c_4_444,
author={Guangji HE, Takanobu SUGAHARA, Yuki MIYAMOTO, Shintaro IZUMI, Hiroshi KAWAGUCHI, Masahiko YOSHIMOTO, },
journal={IEICE TRANSACTIONS on Electronics},
title={A 168-mW 2.4-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI},
year={2013},
volume={E96-C},
number={4},
pages={444-453},
abstract={This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40 nm CMOS technology, occupies 1.77 mm2.18 mm containing 2.52 M transistors for logic and 4.29 Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3 MHz), 48.5% power consumption reduction (74.14 mW) for 60 k-Word real-time continuous speech recognition compared to the previous work while 30% of the area is saved with recognition accuracy of 90.9%. This chip can maximally process 2.4faster than real-time at 200 MHz and 1.1 V with power consumption of 168 mW. By increasing the beam width, better recognition accuracy (91.45%) can be achieved. In that case, the power consumption for real-time processing is increased to 97.4 mW and the max-performance is decreased to 2.08because of the increased computation workload.},
keywords={},
doi={10.1587/transele.E96.C.444},
ISSN={1745-1353},
month={April},}

Copy

TY - JOUR
TI - A 168-mW 2.4-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI
T2 - IEICE TRANSACTIONS on Electronics
SP - 444
EP - 453
AU - Guangji HE
AU - Takanobu SUGAHARA
AU - Yuki MIYAMOTO
AU - Shintaro IZUMI
AU - Hiroshi KAWAGUCHI
AU - Masahiko YOSHIMOTO
PY - 2013
DO - 10.1587/transele.E96.C.444
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E96-C
IS - 4
JA - IEICE TRANSACTIONS on Electronics
Y1 - April 2013
AB - This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40 nm CMOS technology, occupies 1.77 mm2.18 mm containing 2.52 M transistors for logic and 4.29 Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3 MHz), 48.5% power consumption reduction (74.14 mW) for 60 k-Word real-time continuous speech recognition compared to the previous work while 30% of the area is saved with recognition accuracy of 90.9%. This chip can maximally process 2.4faster than real-time at 200 MHz and 1.1 V with power consumption of 168 mW. By increasing the beam width, better recognition accuracy (91.45%) can be achieved. In that case, the power consumption for real-time processing is increased to 97.4 mW and the max-performance is decreased to 2.08because of the increased computation workload.
ER -