The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] discriminative(32hit)

21-32hit(32hit)

  • A Framework of Real Time Hand Gesture Vision Based Human-Computer Interaction

    Liang SHA  Guijin WANG  Xinggang LIN  Kongqiao WANG  

     
    PAPER-Vision

      Vol:
    E94-A No:3
      Page(s):
    979-989

    This paper presents a robust framework of human-computer interaction from the hand gesture vision in the presence of realistic and challenging scenarios. To this end, several novel components are proposed. A hybrid approach is first proposed to automatically infer the beginning position of hand gestures of interest via jointly optimizing the regions given by an offline skin model trained from Gaussian mixture models and a specific hand gesture classifier trained from the Adaboost technique. To consistently track the hand in the context of using kernel based tracking, a semi-supervised feature selection strategy is further presented to choose the feature subspaces which appropriately represent the properties of offline hand skin cues and online foreground-background-classification cues. Taking the histogram of oriented gradients as the descriptor to represent hand gestures, a soft-decision approach is finally proposed for recognizing static hand gestures at the locations where severe ambiguity occurs and hidden Markov model based dynamic gestures are employed for interaction. Experiments on various real video sequences show the superior performance of the proposed components. In addition, the whole framework is applicable to real-time applications on general computing platforms.

  • Learning Speech Variability in Discriminative Acoustic Model Adaptation

    Shoei SATO  Takahiro OKU  Shinichi HOMMA  Akio KOBAYASHI  Toru IMAI  

     
    PAPER-Adaptation

      Vol:
    E93-D No:9
      Page(s):
    2370-2378

    We present a new discriminative method of acoustic model adaptation that deals with a task-dependent speech variability. We have focused on differences of expressions or speaking styles between tasks and set the objective of this method as improving the recognition accuracy of indistinctly pronounced phrases dependent on a speaking style. The adaptation appends subword models for frequently observable variants of subwords in the task. To find the task-dependent variants, low-confidence words are statistically selected from words with higher frequency in the task's adaptation data by using their word lattices. HMM parameters of subword models dependent on the words are discriminatively trained by using linear transforms with a minimum phoneme error (MPE) criterion. For the MPE training, subword accuracy discriminating between the variants and the originals is also investigated. In speech recognition experiments, the proposed adaptation with the subword variants reduced the word error rate by 12.0% relative in a Japanese conversational broadcast task.

  • Constraining a Generative Word Alignment Model with Discriminative Output

    Chooi-Ling GOH  Taro WATANABE  Hirofumi YAMAMOTO  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E93-D No:7
      Page(s):
    1976-1983

    We present a method to constrain a statistical generative word alignment model with the output from a discriminative model. The discriminative model is trained using a small set of hand-aligned data that ensures higher precision in alignment. On the other hand, the generative model improves the recall of alignment. By combining these two models, the alignment output becomes more suitable for use in developing a translation model for a phrase-based statistical machine translation (SMT) system. Our experimental results show that the joint alignment model improves the translation performance. The improvement in average of BLEU and METEOR scores is around 1.0-3.9 points.

  • Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training

    Makoto SAKAI  Norihide KITAOKA  Yuya HATTORI  Seiichi NAKAGAWA  Kazuya TAKEDA  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:2
      Page(s):
    395-398

    To improve speech recognition performance, acoustic feature transformation based on discriminant analysis has been widely used. For the same purpose, discriminative training of HMMs has also been used. In this letter we investigate the effectiveness of these two techniques and their combination. We also investigate the robustness of matched and mismatched noise conditions between training and evaluation environments.

  • Discriminative Weight Training for Support Vector Machine-Based Speech/Music Classification in 3GPP2 SMV Codec

    Sang-Kyun KIM  Joon-Hyuk CHANG  

     
    LETTER-Speech and Hearing

      Vol:
    E93-A No:1
      Page(s):
    316-319

    In this study, a discriminative weight training is applied to a support vector machine (SVM) based speech/music classification for a 3GPP2 selectable mode vocoder (SMV). In the proposed approach, the speech/music decision rule is derived by the SVM by incorporating optimally weighted features derived from the SMV based on a minimum classification error (MCE) method. This method differs from that of the previous work in that different weights are assigned to each feature of the SMV a novel process. According to the experimental results, the proposed approach is effective for speech/music classification using the SVM.

  • Automatic Language Identification with Discriminative Language Characterization Based on SVM

    Hongbin SUO  Ming LI  Ping LU  Yonghong YAN  

     
    PAPER-Language Identification

      Vol:
    E91-D No:3
      Page(s):
    567-575

    Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.

  • Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

    William BYRNE  

     
    INVITED PAPER

      Vol:
    E89-D No:3
      Page(s):
    900-907

    Minimum Bayes risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Markov models and through the identification of smaller recognition tasks which provides the opportunity to incorporate novel modeling and decoding procedures in LVCSR. These techniques are discussed in the context of going 'beyond HMMs', showing in particular that this process of subproblem identification makes it possible to train and apply small-domain binary pattern classifiers, such as Support Vector Machines, to large vocabulary continuous speech recognition.

  • Verification of Multi-Class Recognition Decision: A Classification Approach

    Tomoko MATSUI  Frank K. SOONG  Biing-Hwang JUANG  

     
    PAPER-Spoken Language Systems

      Vol:
    E88-D No:3
      Page(s):
    455-462

    We investigate strategies to improve the utterance verification performance using a 2-class pattern classification approach, including: utilizing N-best candidate scores, modifying segmentation boundaries, applying background and out-of-vocabulary filler models, incorporating contexts, and minimizing verification errors via discriminative training. A connected-digit database recorded in a noisy, moving car with a hands-free microphone mounted on the sun-visor is used to evaluate the verification performance. The equal error rate (EER) of word verification is employed as the sole performance measure. All factors and their effects on the verification performance are presented in detail. The EER is reduced from 29%, using the standard likelihood ratio test, down to 21.4%, when all features are properly integrated.

  • VLSI Architecture and Implementation for Speech Recognizer Based on Discriminative Bayesian Neural Network

    Jhing-Fa WANG  Jia-Ching WANG  An-Nan SUEN  Chung-Hsien WU  Fan-Min LI  

     
    PAPER-Implementations of Signal Processing Systems

      Vol:
    E85-A No:8
      Page(s):
    1861-1869

    In this paper, we present an efficient VLSI architecture for the stand-alone application of a speech recognition system based on discriminative Bayesian neural network (DBNN). Regarding the recognition phase, the architecture of the Bayesian distance unit (BDU) is constructed first. In association with the BDU, we propose a template-serial architecture for the path distance accumulation to perform the recognition procedure. A corresponding architecture is also developed to accelerate the discriminative training procedure. It contains the intelligent look-up table for the sigmoid function. In comparison to the traditional one-table method, the memory size reduces drastically with only slight loss of accuracy. Combining the proposed hardware accelerators with the cost efficient programmable core, we took the most out of both programmable and application-specific architectures, including performance, design complexity, and flexibility.

  • Subspace Method for Minimum Error Pattern Recognition

    Hideyuki WATANABE  Shigeru KATAGIRI  

     
    PAPER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E80-D No:12
      Page(s):
    1195-1204

    In general cases of pattern recognition, a pattern to be recognized is first represented by a set of features and the measured values of the features are then classified. Finding features relevant to recognition is thus an important issue in recognizer design. As a fundamental design framework taht systematically enables one to realize such useful features, the Subspace Method (SM) has been extensively used in various recognition tasks. However, this promising methodological framework is still inadequate. The discriminative power of early versions was not very high. The training behavior of a recent discriminative version called the Learning Subspace Method has not been fully clarified due to its empirical definition, though its discriminative power has been improved. To alleviate this insufficiency, we propose in this paper a new discriminative SM algorithm based on the Minimum Classification Error/Generalized Probabilistic Descent method and show that the proposed algorithm achieves an optimal accurate recognition result, i.e., the (at least locally) minimum recognition error situation, in the probabilistic descent sense.

  • Discriminative Training Based on Minimum Classification Error for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning

    Jun-ichi TAKAHASHI  Shigeki SAGAYAMA  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E79-D No:12
      Page(s):
    1700-1707

    This paper describes how to effectively use discriminative training based on Minimum Classification Error (MCE) criterion for a small amount of data in order to attain the highest level of recognition performance. This method is a combination of MCE training and Vector-Field-Smoothed Bayesian learning called MAP/VFS, which combines maximum a posteriori (MAP) estimation with Vector Field Smoothing (VFS). In the proposed method, MAP/VFS can significantly enhance MCE training in the robustness of acoustic modeling. In model training, MCE training is performed using the MAP/VFS-trained model as an initial model. The same data are used in both trainings. For speaker adaptation using several dozen training words, the proposed method has been experimentally proven to be very effective. For 50-word training data, recognition errors are drastically reduced by 47% compared with 16.5% when using only MCE. This high rate, in which 39% is due to MAP, an additional 4% is due to VFS, and a further improvement of 4% is due to MCE, can be attained by enhancing MCE training capability by MAP/VFS.

  • A Study on Speaker Adaptation for Mandarin Syllable Recognition with Minimum Error Discriminative Training

    Chih-Heng LIN  Chien-Hsing WU  Pao-Chung CHANG  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    712-718

    This paper investigates a different method of speaker adaptation for Mandarin syllable recognition. Based on the minimum classification error (MCE) criterion, we use the generalized probabilistic decent (GPD) algorithm to adjust interatively the parameters of the hidden Markov models (HMM). The experiments on the multi-speaker Mandarin syllable database of Telecommunication Laboratories (T.L.) yield the following results: 1) Efficient speaker adaptation can be achieved through discriminative training using the MCE criterion and the GPD algorithm. 2) The computations required can be reduced through the use of the confusion sets in Mandarin base syllables. 3) For the discriminative training, the adjustment on the mean values of the Gaussian mixtures has the most prominent effect on speaker adaptation. 4) The discriminative training approach can be used to enhance the speaker adaptation capability of the maximum a posteriori (MAP) approach.

21-32hit(32hit)