The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] Hidden Markov Model(71hit)

41-60hit(71hit)

  • What HMMs Can Do

    Jeff A. BILMES  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    869-891

    Since their inception almost fifty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems--today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial article analyzes HMMs by exploring a definition of HMMs in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM (say for ASR), rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.

  • Training Augmented Models Using SVMs

    Mark J.F. GALES  Martin I. LAYTON  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    892-899

    There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than those contained within a standard hidden Markov model (HMM). This paper discusses one such class of models, augmented statistical models. Here, a local exponential approximation is made about some point on a base model. This allows additional dependencies within the data to be modelled than are represented in the base distribution. Augmented models based on Gaussian mixture models (GMMs) and HMMs are briefly described. These augmented models are then related to generative kernels, one approach used for allowing support vector machines (SVMs) to be applied to variable length data. The training of augmented statistical models within an SVM, generative kernel, framework is then discussed. This may be viewed as using maximum margin training to estimate statistical models. Augmented Gaussian mixture models are then evaluated using rescoring on a large vocabulary speech recognition task.

  • Genetic Algorithm Based Optimization of Partly-Hidden Markov Model Structure Using Discriminative Criterion

    Tetsuji OGAWA  Tetsunori KOBAYASHI  

     
    PAPER-Speech Recognition

      Vol:
    E89-D No:3
      Page(s):
    939-945

    A discriminative modeling is applied to optimize the structure of a Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent behaviors in both observations and state transitions. In the formulation of the previous PHMM, we used a common structure for all models. However, it is expected that the optimal structure which gives the best performance differs from category to category. In this paper, we designed a new structure optimization method in which the dependence of the states and the observations of PHMM are optimally defined according to each model using the weighted likelihood-ratio maximization (WLRM) criterion. The WLRM criterion gives high discriminability between the correct category and the incorrect categories. Therefore it gives model structures with good discriminative performance. We define the model structure combination which satisfy the WLRM criterion for any possible structure combinations as the optimal structures. A genetic algorithm is also applied to the adequate approximation of a full search. With results of continuous lecture talk speech recognition, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with a common structure for all models.

  • A Novel Approach for Modeling a Hybrid ARQ (Automatic Repeat Request) Based on the Hidden Markov Model

    Yong Ho KIM  Tae Yong KIM  Young Yong KIM  

     
    LETTER-Network

      Vol:
    E88-B No:9
      Page(s):
    3772-3775

    In this letter, we propose a novel approach for use in the analytical modeling of the overall performance of a Hybrid ARQ (type I and II) together with arbitrary channel model, based on Hidden Markov Model (HMM). Using the combined HMM model developed for involved ARQ protocols with the finite state channel model, such critical performance measure as throughput and delay can be derived in closed form. Analytical results are derived for Stop-and-Wait as well as Go-back-N type together with the type I and type II Hybrid ARQ scheme adopted. We compare the analytical results along with the simulation results in order to check the correctness our model, and show the efficiency of our approach by applying it to realistic environments such as the CDMA IS-95 system with its derived equations.

  • Extension of Hidden Markov Models for Multiple Candidates and Its Application to Gesture Recognition

    Yosuke SATO  Tetsuji OGAWA  Tetsunori KOBAYASHI  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E88-D No:6
      Page(s):
    1239-1247

    We propose a modified Hidden Markov Model (HMM) with a view to improve gesture recognition using a moving camera. The conventional HMM is formulated so as to deal with only one feature candidate per frame. However, for a mobile robot, the background and the lighting conditions are always changing, and the feature extraction problem becomes difficult. It is almost impossible to extract a reliable feature vector under such conditions. In this paper, we define a new gesture recognition framework in which multiple candidates of feature vectors are generated with confidence measures and the HMM is extended to deal with these multiple feature vectors. Experimental results comparing the proposed system with feature vectors based on DCT and the method of selecting only one candidate feature point verifies the effectiveness of the proposed technique.

  • Level-Building on AdaBoost HMM Classifiers and the Application to Visual Speech Processing

    Liang DONG  Say-Wei FOO  Yong LIAN  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:11
      Page(s):
    2460-2471

    The Hidden Markov Model (HMM) is a popular statistical framework for modeling and analyzing stochastic signals. In this paper, a novel strategy is proposed that makes use of level-building algorithm with a chain of AdaBoost HMM classifiers to model long stochastic processes. AdaBoost HMM classifier belongs to the class of multiple-HMM classifier. It is specially trained to identify samples with erratic distributions. By connecting the AdaBoost HMM classifiers, processes of arbitrary length can be modeled. A probability trellis is created to store the accumulated probabilities, starting frames and indices of each reference model. By backtracking the trellis, a sequence of best-matched AdaBoost HMM classifiers can be decoded. The proposed method is applied to visual speech processing. A selected number of words and phrases are decomposed into sequences of visual speech units using both the proposed strategy and the conventional level-building on HMM method. Experimental results show that the proposed strategy is able to more accurately decompose words/phrases in visual speech than the conventional approach.

  • Target Identification from Multi-Aspect High Range-Resolution Radar Signatures Using a Hidden Markov Model

    Masahiko NISHIMOTO  Xuejun LIAO  Lawrence CARIN  

     
    PAPER-Electromagnetic Theory

      Vol:
    E87-C No:10
      Page(s):
    1706-1714

    Identification of targets using sequential high range-resolution (HRR) radar signatures is studied. Classifiers are designed by using hidden Markov models (HMMs) to characterize the sequential information in multi-aspect HRR signatures. The higher-order moments together with the target dimension and the number of dominant wavefronts are used as features of the transient HRR waveforms. Classification results are presented for the ten-target MSTAR data set. The example results show that good classification performance and robustness are obtained, although the target features used here are very simple and compact compared with the complex HRR signatures.

  • Person Recognition Method Using Sequential Walking Footprints via Overlapped Foot Shape and Center-of-Pressure Trajectory

    Jin-Woo JUNG  Zeungnam BIEN  Tomomasa SATO  

     
    PAPER

      Vol:
    E87-A No:6
      Page(s):
    1393-1400

    Many diverse methods have been developed in the field of biometric identification as a greater emphasis is placed on human-friendliness in the area of intelligent systems. One emerging method is the use of human footprint. However, in the previous research, there were some limitations resulting from the spatial resolution of sensors. One possible method to overcome this limitation is through the use additional information such as dynamic walking information in dynamic footprint. In this study, we suggest a new person recognition scheme based on overlapped foot shape and COP (Center Of Pressure) trajectory during one-step walking. And, we show the usefulness of the suggested method, obtaining a 98.6% recognition rate in our experiment with eleven people.

  • An Adaptive Visual Attentive Tracker with HMM-Based TD Learning Capability for Human Intended Behavior

    Minh Anh Thi HO  Yoji YAMADA  Yoji UMETANI  

     
    PAPER-Artificial Intelligence, Cognitive Science

      Vol:
    E86-D No:6
      Page(s):
    1051-1058

    In the study, we build a system called Adaptive Visual Attentive Tracker (AVAT) for the purpose of developing a non-verbal communication channel between the system and an operator who presents intended movements. In the system, we constructed an HMM (Hidden Markov Models)-based TD (Temporal Difference) learning algorithm to track and zoom in on an operator's behavioral sequence which represents his/her intention. AVAT extracts human intended movements from ordinary walking behavior based on the following two algorithms: the first is to model the movements of human body parts using HMMs algorithm, and the second is to learn the model of the tracker's action using a model-based TD learning algorithm. In the paper, we describe the integrated algorithm of the above two methods: whose linkage is established by assigning the state transition probability in HMM as a reward in TD learning. Experimental results of extracting an operator's hand sign action sequence during her natural walking motion are shown which demonstrates the function of AVAT as it is developed within the framework of perceptual organization. Identification of the sign gesture context through wavelet analysis autonomously provides a reward value for optimizing AVAT's action patterns.

  • Gesture Recognition Using HLAC Features of PARCOR Images

    Takio KURITA  Satoru HAYAMIZU  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E86-D No:4
      Page(s):
    719-726

    This paper proposes a gesture recognition method which uses higher order local autocorrelation (HLAC) features extracted from PARCOR images. To extract dominant information from a sequence of images, we apply linear prediction coding technique to the sequence of pixel intensities and PARCOR images are constructed from the PARCOR coefficients of the sequences of the pixel values. From the PARCOR images, HLAC features are extracted and the sequences of the features are used as the input vectors of the Hidden Markov Model (HMM) based recognizer. Since HLAC features are inherently shift-invariant and computationally inexpensive, the proposed method becomes robust to changes in the person's position and makes real-time gesture recognition possible. Experimental results of gesture recognition are shown to evaluate the performance of the proposed method.

  • On Automatic Speech Recognition at the Dawn of the 21st Century

    Chin-Hui LEE  

     
    INVITED SURVEY PAPER

      Vol:
    E86-D No:3
      Page(s):
    377-396

    In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.

  • A Highway Surveillance System Using an HMM-Based Segmentation Method

    Jien KATO  Toyohide WATANABE  Hiroyuki HASE  

     
    PAPER

      Vol:
    E85-D No:11
      Page(s):
    1767-1775

    Automatic traffic surveillance based on visual tracking techniques has been desired for many years. This paper proposes a basic highway surveillance system using an HMM-based segmentation method. The presented system meets the essential requirement of ITS: real-time running. Its another advantage is robustness to the shadows of moving objects, which have been recognized as one of main obstacles to robust car tracking. At present, using the system we can estimate velocity of vehicles with high accuracy. For acquiring metric information in the real world, the system does not require a precise calibration but only needs four point correspondences between the image plane and ground plane.

  • Multi-Space Probability Distribution HMM

    Keiichi TOKUDA  Takashi MASUKO  Noboru MIYAZAKI  Takao KOBAYASHI  

     
    INVITED PAPER-Pattern Recognition

      Vol:
    E85-D No:3
      Page(s):
    455-464

    This paper proposes a new kind of hidden Markov model (HMM) based on multi-space probability distribution, and derives a parameter estimation algorithm for the extended HMM. HMMs are widely used statistical models for characterizing sequences of speech spectra, and have been successfully applied to speech recognition systems. HMMs are categorized into discrete HMMs and continuous HMMs, which can model sequences of discrete symbols and continuous vectors, respectively. However, we cannot apply both the conventional discrete and continuous HMMs to observation sequences which consist of continuous values and discrete symbols: F0 pattern modeling of speech is a good illustration. The proposed HMM includes discrete HMM and continuous HMM as special cases, and furthermore, can model sequences which consist of observation vectors with variable dimensionality and discrete symbols.

  • Proposal of an Adaptive Vision-Based Interactional Intention Inference System in Human/Robot Coexistence

    Minh Anh Thi HO  Yoji YAMADA  Takayuki SAKAI  Tetsuya MORIZONO  Yoji UMETANI  

     
    PAPER

      Vol:
    E84-D No:12
      Page(s):
    1596-1602

    The paper proposes a vision-based system for adaptively inferring the interactional intention of a person coming close to a robot, which plays an important role in the succeeding stage of human/robot cooperative handling of works/tools in production lines. Here, interactional intention is ranged in the meaning of the intention to interact/operate with the robot, which is proposed to be estimated by the human head moving path during an incipient period of time. To implement this intention inference capability, first, human entrance is detected and is modeled by an ellipse to supply information about the head position. Second, B-spline technique is used to approximate the trajectory with reduced control points in order that the system acquires information about the human motion direction and the curvature of the motion trajectory. Finally, Hidden Markov Models (HMMs) are applied as the adaptive inference engines at the stage of inferring the human interactional intention. The HMM algorithm with a stochastic pattern matching capability is extended to supply whether or not a person has an intention toward the robot at the incipient time. The reestimation process here models the motion behavior of an human worker when he has or doesn't have the intention to operate the robot. Experimental results demonstrate the adaptability of the inference system using the extended HMM algorithm for filtering out motion deviation over the trajectory.

  • Lip Location Normalized Training for Visual Speech Recognition

    Oscar VANEGAS  Keiichi TOKUDA  Tadashi KITAMURA  

     
    PAPER-Speech and Hearing

      Vol:
    E83-D No:11
      Page(s):
    1969-1977

    This paper describes a method to normalize the lip position for improving the performance of a visual-information-based speech recognition system. Basically, there are two types of information useful in speech recognition processes; the first one is the speech signal itself and the second one is the visual information from the lips in motion. This paper tries to solve some problems caused by using images from the lips in motion such as the effect produced by the variation of the lip location. The proposed lip location normalization method is based on a search algorithm of the lip position in which the location normalization is integrated into the model training. Experiments of speaker-independent isolated word recognition were carried out on the Tulips1 and M2VTS databases. Experiments showed a recognition rate of 74.5% and an error reduction rate of 35.7% for the ten digits word recognition M2VTS database.

  • Recognition of Alphabetical Hand Gestures Using Hidden Markov Model

    Ho-Sub YOON  Jung SOH  Byung-Woo MIN  Hyun Seung YANG  

     
    PAPER-Neural Networks

      Vol:
    E82-A No:7
      Page(s):
    1358-1366

    The use of hand gesture provides an attractive alternative to cumbersome interface devices for human-computer interaction (HCI). In particular, visual interpretation of hand gestures can help achieve easy and natural comprehension for HCI. Many methods for hand gesture recognition using visual analysis have been proposed such as syntactical analysis, neural network (NN), and hidden Markov model (HMM)s. In our research, HMMs are proposed for alphabetical hand gesture recognition. In the preprocessing stage, the proposed approach consists of three different procedures for hand localization, hand tracking and gesture spotting. The hand location procedure detects the candidated regions on the basis of skin color and motion in an image by using a color histogram matching and time-varying edge difference techniques. The hand tracking algorithm finds the centroid of a moving hand region, connect those centroids, and produces a trajectory. The spotting algorithm divides the trajectory into real and meaningless gestures. In constructing a feature database, the proposed approach uses the weighted ρ-φ-ν feature code, and employ a k-means algorithm for the codebook of HMM. In our experiments, 1,300 alphabetical and 1,300 untrained gestures are used for training and testing, respectively. Those experimental results demonstrate that the proposed approach yields a higher and satisfactory recognition rate for the images with different sizes, shapes and skew angles.

  • A Frame-Dependent Fuzzy Compensation Method for Speech Recognition over Time-Varying Telephone Channels

    Wei-Wen HUNG  Hsiao-Chuan WANG  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E82-D No:2
      Page(s):
    431-438

    Speech signals transmitted over telephone network often suffer from interference due to ambient noise and channel distortion. In this paper, a novel frame-dependent fuzzy channel compensation (FD-FCC) method employing two-stage bias subtraction is proposed to minimize the channel effect. First, through maximum likelihood (ML) estimation over the set of all word models, we choose the word model which is best matched with the input utterance. Then, based upon this word model, a set of mixture biases can be derived by averaging the cepstral differences between the input utterance and the chosen model. In the second stage, instead of using a single bias, a frame-dependent bias is calculated for each input frame to equalize the channel variations in the input utterance. This frame-dependent bias is achieved by the convex combination of those mixture biases which are weighted by a fuzzy membership function. Experimental results show that the channel effect can be effectively canceled even though the additive background noise is involved in a telephone speech recognition system.

  • A Hierarchical HMM Network-Based Approach for On-Line Recognition of Multi-Lingual Cursive Handwritings

    Jay June LEE  Jin Hyung KIM  Masayuki NAKAJIMA  

     
    PAPER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E81-D No:8
      Page(s):
    881-888

    Multi-lingual handwriting means the script written with more than one language. In this paper, a hierarchical hidden Markov model network-based approach is proposed for on-line recognition of multi-lingual cursive handwritings. Basic characters of language, language network, and intermixed use of language are modeled with hierarchical relations. Since recognition corresponds to finding an optimal path in such a network, recognition candidates of each language are combined with probability without special treatment. Character labels of handwriting, language modes, and segmentation are obtained simultaneously. However, several difficulties caused by multiple language occurred during recognition. Applied heuristic methods are Markov chain for language mode transitions, pairwise discrimination for confusing pairs, and constrained routines for side effects by language related preprocessing methods. In spite of the addition of other language, recognition accuracy of each language drops negligibly on experimental results of multi-lingual with Hangul, English, and Digit case.

  • An Isolated Word Speech Recognition Based on Fusion of Visual and Auditory Information Usisng 30-frame/s and 24-bit Color Image

    Akio OGIHARA  Shinobu ASAO  

     
    PAPER

      Vol:
    E80-A No:8
      Page(s):
    1417-1422

    In the field of speech recognition, many researchers have proposed speech recognition methods using auditory information like acoustic signal or visual information like shape and motion of lips. Auditory information has valid features for speech recognition, but it is difficult to accomplish speech recognition in noisy environment. On the other side, visual information has advantage to accomplish speech recognition in noisy environment, but it is difficult to extract effective features for speech recognition. Thus, in case of using either auditory information or visual information, it is difficult to accomplish speech recognition perfectly. In this paper, we propose a method to fuse auditory information and visual information in order to realize more accurate speech recognition. The proposed method consists of two processes: (1) two probabilities for auditory information and visual information are calculated by HMM, (2) these probabilities are fused by using linear combination. We have performed speech recognition experiments of isolated words, whose auditory information (22.05kHz sampling, 8-bit quantization) and visual information (30-frame/s sampling, 24-bit quantization) are captured with multi-media personal computer, and have confirmed the validity of the proposed method.

  • Discriminative Training Based on Minimum Classification Error for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning

    Jun-ichi TAKAHASHI  Shigeki SAGAYAMA  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E79-D No:12
      Page(s):
    1700-1707

    This paper describes how to effectively use discriminative training based on Minimum Classification Error (MCE) criterion for a small amount of data in order to attain the highest level of recognition performance. This method is a combination of MCE training and Vector-Field-Smoothed Bayesian learning called MAP/VFS, which combines maximum a posteriori (MAP) estimation with Vector Field Smoothing (VFS). In the proposed method, MAP/VFS can significantly enhance MCE training in the robustness of acoustic modeling. In model training, MCE training is performed using the MAP/VFS-trained model as an initial model. The same data are used in both trainings. For speaker adaptation using several dozen training words, the proposed method has been experimentally proven to be very effective. For 50-word training data, recognition errors are drastically reduced by 47% compared with 16.5% when using only MCE. This high rate, in which 39% is due to MAP, an additional 4% is due to VFS, and a further improvement of 4% is due to MCE, can be attained by enhancing MCE training capability by MAP/VFS.

41-60hit(71hit)