The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] hidden Markov model (HMM)(7hit)

1-7hit
  • A 168-mW 2.4-Real-Time 60-kWord Continuous Speech Recognition Processor VLSI

    Guangji HE  Takanobu SUGAHARA  Yuki MIYAMOTO  Shintaro IZUMI  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    444-453

    This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). It features a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40 nm CMOS technology, occupies 1.77 mm2.18 mm containing 2.52 M transistors for logic and 4.29 Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3 MHz), 48.5% power consumption reduction (74.14 mW) for 60 k-Word real-time continuous speech recognition compared to the previous work while 30% of the area is saved with recognition accuracy of 90.9%. This chip can maximally process 2.4faster than real-time at 200 MHz and 1.1 V with power consumption of 168 mW. By increasing the beam width, better recognition accuracy (91.45%) can be achieved. In that case, the power consumption for real-time processing is increased to 97.4 mW and the max-performance is decreased to 2.08because of the increased computation workload.

  • A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition

    Kazuhiro NAKAMURA  Ryo SHIMAZAKI  Masatoshi YAMAMOTO  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    456-467

    This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.

  • VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition

    Hiroki NOGUCHI  Kazuo MIURA  Tsuyoshi FUJINAGA  Takanobu SUGAHARA  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER

      Vol:
    E94-C No:4
      Page(s):
    458-467

    We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition.

  • A VLSI Architecture for Output Probability Computations of HMM-Based Recognition Systems with Store-Based Block Parallel Processing

    Kazuhiro NAKAMURA  Masatoshi YAMAMOTO  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER-VLSI Systems

      Vol:
    E93-D No:2
      Page(s):
    300-305

    In this paper, a fast and memory-efficient VLSI architecture for output probability computations of continuous Hidden Markov Models (HMMs) is presented. These computations are the most time-consuming part of HMM-based recognition systems. High-speed VLSI architectures with small registers and low-power dissipation are required for the development of mobile embedded systems with capable human interfaces. We demonstrate store-based block parallel processing (StoreBPP) for output probability computations and present a VLSI architecture that supports it. When the number of HMM states is adequate for accurate recognition, compared with conventional stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and processing elements and less processing time. The processing elements used in the StreamBPP architecture are identical to those used in the StoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows the efficiency of the proposed architecture through efficient use of registers for storing input feature vectors and intermediate results during computation.

  • A Systolic FPGA Architecture of Two-Level Dynamic Programming for Connected Speech Recognition

    Yong KIM  Hong JEONG  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:2
      Page(s):
    562-568

    In this paper, we present an efficient architecture for connected word recognition that can be implemented with field programmable gate array (FPGA). The architecture consists of newly derived two-level dynamic programming (TLDP) that use only bit addition and shift operations. The advantages of this architecture are the spatial efficiency to accommodate more words with limited space and the absence of multiplications to increase computational speed by reducing propagation delays. The architecture is highly regular, consisting of identical and simple processing elements with only nearest-neighbor communication, and external communication occurs with the end processing elements. In order to verify the proposed architecture, we have also designed and implemented it, prototyping with Xilinx FPGAs running at 33 MHz.

  • Recognition of Alphabetical Hand Gestures Using Hidden Markov Model

    Ho-Sub YOON  Jung SOH  Byung-Woo MIN  Hyun Seung YANG  

     
    PAPER-Neural Networks

      Vol:
    E82-A No:7
      Page(s):
    1358-1366

    The use of hand gesture provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help achieve easy and natural comprehension for HCI. Many methods for hand gesture recognition using visual analysis have been proposed such as syntactical analysis, neural network (NN), and hidden Markov model (HMM)s. In our research, HMMs are proposed for alphabetical hand gesture recognition. In the preprocessing stage, the proposed approach consists of three different procedures for hand localization, hand tracking and gesture spotting. The hand location procedure detects the candidated regions on the basis of skin color and motion in an image by using a color histogram matching and time-varying edge difference techniques. The hand tracking algorithm finds the centroid of a moving hand region, connect those centroids, and produces a trajectory. The spotting algorithm divides the trajectory into real and meaningless gestures. In constructing a feature database, the proposed approach uses the weighted ρ-φ-ν feature code, and employ a k-means algorithm for the codebook of HMM. In our experiments, 1,300 alphabetical and 1,300 untrained gestures are used for training and testing, respectively. Those experimental results demonstrate that the proposed approach yields a higher and satisfactory recognition rate for the images with different sizes, shapes and skew angles.

  • A Frame-Dependent Fuzzy Compensation Method for Speech Recognition over Time-Varying Telephone Channels

    Wei-Wen HUNG  Hsiao-Chuan WANG  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E82-D No:2
      Page(s):
    431-438

    Speech signals transmitted over telephone network often suffer from interference due to ambient noise and channel distortion. In this paper, a novel frame-dependent fuzzy channel compensation (FD-FCC) method employing two-stage bias subtraction is proposed to minimize the channel effect. First, through maximum likelihood (ML) estimation over the set of all word models, we choose the word model which is best matched with the input utterance. Then, based upon this word model, a set of mixture biases can be derived by averaging the cepstral differences between the input utterance and the chosen model. In the second stage, instead of using a single bias, a frame-dependent bias is calculated for each input frame to equalize the channel variations in the input utterance. This frame-dependent bias is achieved by the convex combination of those mixture biases which are weighted by a fuzzy membership function. Experimental results show that the channel effect can be effectively canceled even though the additive background noise is involved in a telephone speech recognition system.