IEICE global.ieice.org Site

Keyword Search Result

[Keyword] hidden Markov model (HMM)(7hit)

1-7hit

A 168-mW 2.4-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI
Guangji HE Takanobu SUGAHARA Yuki MIYAMOTO Shintaro IZUMI Hiroshi KAWAGUCHI Masahiko YOSHIMOTO

PAPER

Vol:
E96-C No:4
Page(s):
444-453
This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40 nm CMOS technology, occupies 1.77 mm2.18 mm containing 2.52 M transistors for logic and 4.29 Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3 MHz), 48.5% power consumption reduction (74.14 mW) for 60 k-Word real-time continuous speech recognition compared to the previous work while 30% of the area is saved with recognition accuracy of 90.9%. This chip can maximally process 2.4faster than real-time at 200 MHz and 1.1 V with power consumption of 168 mW. By increasing the beam width, better recognition accuracy (91.45%) can be achieved. In that case, the power consumption for real-time processing is increased to 97.4 mW and the max-performance is decreased to 2.08because of the increased computation workload.
A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition
Kazuhiro NAKAMURA Ryo SHIMAZAKI Masatoshi YAMAMOTO Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER

Vol:
E95-C No:4
Page(s):
456-467
This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.
VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition
Hiroki NOGUCHI Kazuo MIURA Tsuyoshi FUJINAGA Takanobu SUGAHARA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO

PAPER

Vol:
E94-C No:4
Page(s):
458-467
We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition.
A VLSI Architecture for Output Probability Computations of HMM-Based Recognition Systems with Store-Based Block Parallel Processing
Kazuhiro NAKAMURA Masatoshi YAMAMOTO Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER-VLSI Systems

Vol:
E93-D No:2
Page(s):
300-305
In this paper, a fast and memory-efficient VLSI architecture for output probability computations of continuous Hidden Markov Models (HMMs) is presented. These computations are the most time-consuming part of HMM-based recognition systems. High-speed VLSI architectures with small registers and low-power dissipation are required for the development of mobile embedded systems with capable human interfaces. We demonstrate store-based block parallel processing (StoreBPP) for output probability computations and present a VLSI architecture that supports it. When the number of HMM states is adequate for accurate recognition, compared with conventional stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and processing elements and less processing time. The processing elements used in the StreamBPP architecture are identical to those used in the StoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows the efficiency of the proposed architecture through efficient use of registers for storing input feature vectors and intermediate results during computation.
A Systolic FPGA Architecture of Two-Level Dynamic Programming for Connected Speech Recognition
Yong KIM Hong JEONG

PAPER-Speech and Hearing

Vol:
E90-D No:2
Page(s):
562-568
In this paper, we present an efficient architecture for connected word recognition that can be implemented with field programmable gate array (FPGA). The architecture consists of newly derived two-level dynamic programming (TLDP) that use only bit addition and shift operations. The advantages of this architecture are the spatial efficiency to accommodate more words with limited space and the absence of multiplications to increase computational speed by reducing propagation delays. The architecture is highly regular, consisting of identical and simple processing elements with only nearest-neighbor communication, and external communication occurs with the end processing elements. In order to verify the proposed architecture, we have also designed and implemented it, prototyping with Xilinx FPGAs running at 33 MHz.
Recognition of Alphabetical Hand Gestures Using Hidden Markov Model
Ho-Sub YOON Jung SOH Byung-Woo MIN Hyun Seung YANG

PAPER-Neural Networks

Vol:
E82-A No:7
Page(s):
1358-1366
The use of hand gesture provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help achieve easy and natural comprehension for HCI. Many methods for hand gesture recognition using visual analysis have been proposed such as syntactical analysis, neural network (NN), and hidden Markov model (HMM)s. In our research, HMMs are proposed for alphabetical hand gesture recognition. In the preprocessing stage, the proposed approach consists of three different procedures for hand localization, hand tracking and gesture spotting. The hand location procedure detects the candidated regions on the basis of skin color and motion in an image by using a color histogram matching and time-varying edge difference techniques. The hand tracking algorithm finds the centroid of a moving hand region, connect those centroids, and produces a trajectory. The spotting algorithm divides the trajectory into real and meaningless gestures. In constructing a feature database, the proposed approach uses the weighted ρ-φ-ν feature code, and employ a k-means algorithm for the codebook of HMM. In our experiments, 1,300 alphabetical and 1,300 untrained gestures are used for training and testing, respectively. Those experimental results demonstrate that the proposed approach yields a higher and satisfactory recognition rate for the images with different sizes, shapes and skew angles.
A Frame-Dependent Fuzzy Compensation Method for Speech Recognition over Time-Varying Telephone Channels
Wei-Wen HUNG Hsiao-Chuan WANG

PAPER-Speech Processing and Acoustics

Vol:
E82-D No:2
Page(s):
431-438
Speech signals transmitted over telephone network often suffer from interference due to ambient noise and channel distortion. In this paper, a novel frame-dependent fuzzy channel compensation (FD-FCC) method employing two-stage bias subtraction is proposed to minimize the channel effect. First, through maximum likelihood (ML) estimation over the set of all word models, we choose the word model which is best matched with the input utterance. Then, based upon this word model, a set of mixture biases can be derived by averaging the cepstral differences between the input utterance and the chosen model. In the second stage, instead of using a single bias, a frame-dependent bias is calculated for each input frame to equalize the channel variations in the input utterance. This frame-dependent bias is achieved by the convex combination of those mixture biases which are weighted by a fuzzy membership function. Experimental results show that the channel effect can be effectively canceled even though the additive background noise is involved in a telephone speech recognition system.

Keyword Search Result

[Keyword] hidden Markov model (HMM)(7hit)

A 168-mW 2.4-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI

A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition

VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition

A VLSI Architecture for Output Probability Computations of HMM-Based Recognition Systems with Store-Based Block Parallel Processing

A Systolic FPGA Architecture of Two-Level Dynamic Programming for Connected Speech Recognition

Recognition of Alphabetical Hand Gestures Using Hidden Markov Model

A Frame-Dependent Fuzzy Compensation Method for Speech Recognition over Time-Varying Telephone Channels

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles