IEICE global.ieice.org Site

Keyword Search Result

[Keyword] discriminative(32hit)

21-32hit(32hit)

A Framework of Real Time Hand Gesture Vision Based Human-Computer Interaction
Liang SHA Guijin WANG Xinggang LIN Kongqiao WANG

PAPER-Vision

Vol:
E94-A No:3
Page(s):
979-989
This paper presents a robust framework of human-computer interaction from the hand gesture vision in the presence of realistic and challenging scenarios. To this end, several novel components are proposed. A hybrid approach is first proposed to automatically infer the beginning position of hand gestures of interest via jointly optimizing the regions given by an offline skin model trained from Gaussian mixture models and a specific hand gesture classifier trained from the Adaboost technique. To consistently track the hand in the context of using kernel based tracking, a semi-supervised feature selection strategy is further presented to choose the feature subspaces which appropriately represent the properties of offline hand skin cues and online foreground-background-classification cues. Taking the histogram of oriented gradients as the descriptor to represent hand gestures, a soft-decision approach is finally proposed for recognizing static hand gestures at the locations where severe ambiguity occurs and hidden Markov model based dynamic gestures are employed for interaction. Experiments on various real video sequences show the superior performance of the proposed components. In addition, the whole framework is applicable to real-time applications on general computing platforms.
Learning Speech Variability in Discriminative Acoustic Model Adaptation
Shoei SATO Takahiro OKU Shinichi HOMMA Akio KOBAYASHI Toru IMAI

PAPER-Adaptation

Vol:
E93-D No:9
Page(s):
2370-2378
We present a new discriminative method of acoustic model adaptation that deals with a task-dependent speech variability. We have focused on differences of expressions or speaking styles between tasks and set the objective of this method as improving the recognition accuracy of indistinctly pronounced phrases dependent on a speaking style. The adaptation appends subword models for frequently observable variants of subwords in the task. To find the task-dependent variants, low-confidence words are statistically selected from words with higher frequency in the task's adaptation data by using their word lattices. HMM parameters of subword models dependent on the words are discriminatively trained by using linear transforms with a minimum phoneme error (MPE) criterion. For the MPE training, subword accuracy discriminating between the variants and the originals is also investigated. In speech recognition experiments, the proposed adaptation with the subword variants reduced the word error rate by 12.0% relative in a Japanese conversational broadcast task.
Constraining a Generative Word Alignment Model with Discriminative Output
Chooi-Ling GOH Taro WATANABE Hirofumi YAMAMOTO Eiichiro SUMITA

PAPER-Natural Language Processing

Vol:
E93-D No:7
Page(s):
1976-1983
We present a method to constrain a statistical generative word alignment model with the output from a discriminative model. The discriminative model is trained using a small set of hand-aligned data that ensures higher precision in alignment. On the other hand, the generative model improves the recall of alignment. By combining these two models, the alignment output becomes more suitable for use in developing a translation model for a phrase-based statistical machine translation (SMT) system. Our experimental results show that the joint alignment model improves the translation performance. The improvement in average of BLEU and METEOR scores is around 1.0-3.9 points.
Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training
Makoto SAKAI Norihide KITAOKA Yuya HATTORI Seiichi NAKAGAWA Kazuya TAKEDA

LETTER-Speech and Hearing

Vol:
E93-D No:2
Page(s):
395-398
To improve speech recognition performance, acoustic feature transformation based on discriminant analysis has been widely used. For the same purpose, discriminative training of HMMs has also been used. In this letter we investigate the effectiveness of these two techniques and their combination. We also investigate the robustness of matched and mismatched noise conditions between training and evaluation environments.
Discriminative Weight Training for Support Vector Machine-Based Speech/Music Classification in 3GPP2 SMV Codec
Sang-Kyun KIM Joon-Hyuk CHANG

LETTER-Speech and Hearing

Vol:
E93-A No:1
Page(s):
316-319
In this study, a discriminative weight training is applied to a support vector machine (SVM) based speech/music classification for a 3GPP2 selectable mode vocoder (SMV). In the proposed approach, the speech/music decision rule is derived by the SVM by incorporating optimally weighted features derived from the SMV based on a minimum classification error (MCE) method. This method differs from that of the previous work in that different weights are assigned to each feature of the SMV a novel process. According to the experimental results, the proposed approach is effective for speech/music classification using the SVM.
Automatic Language Identification with Discriminative Language Characterization Based on SVM
Hongbin SUO Ming LI Ping LU Yonghong YAN

PAPER-Language Identification

Vol:
E91-D No:3
Page(s):
567-575
Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.
Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition
William BYRNE

INVITED PAPER

Vol:
E89-D No:3
Page(s):
900-907
Minimum Bayes risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Markov models and through the identification of smaller recognition tasks which provides the opportunity to incorporate novel modeling and decoding procedures in LVCSR. These techniques are discussed in the context of going 'beyond HMMs', showing in particular that this process of subproblem identification makes it possible to train and apply small-domain binary pattern classifiers, such as Support Vector Machines, to large vocabulary continuous speech recognition.
Verification of Multi-Class Recognition Decision: A Classification Approach
Tomoko MATSUI Frank K. SOONG Biing-Hwang JUANG

PAPER-Spoken Language Systems

Vol:
E88-D No:3
Page(s):
455-462
We investigate strategies to improve the utterance verification performance using a 2-class pattern classification approach, including: utilizing N-best candidate scores, modifying segmentation boundaries, applying background and out-of-vocabulary filler models, incorporating contexts, and minimizing verification errors via discriminative training. A connected-digit database recorded in a noisy, moving car with a hands-free microphone mounted on the sun-visor is used to evaluate the verification performance. The equal error rate (EER) of word verification is employed as the sole performance measure. All factors and their effects on the verification performance are presented in detail. The EER is reduced from 29%, using the standard likelihood ratio test, down to 21.4%, when all features are properly integrated.
VLSI Architecture and Implementation for Speech Recognizer Based on Discriminative Bayesian Neural Network
Jhing-Fa WANG Jia-Ching WANG An-Nan SUEN Chung-Hsien WU Fan-Min LI

PAPER-Implementations of Signal Processing Systems

Vol:
E85-A No:8
Page(s):
1861-1869
In this paper, we present an efficient VLSI architecture for the stand-alone application of a speech recognition system based on discriminative Bayesian neural network (DBNN). Regarding the recognition phase, the architecture of the Bayesian distance unit (BDU) is constructed first. In association with the BDU, we propose a template-serial architecture for the path distance accumulation to perform the recognition procedure. A corresponding architecture is also developed to accelerate the discriminative training procedure. It contains the intelligent look-up table for the sigmoid function. In comparison to the traditional one-table method, the memory size reduces drastically with only slight loss of accuracy. Combining the proposed hardware accelerators with the cost efficient programmable core, we took the most out of both programmable and application-specific architectures, including performance, design complexity, and flexibility.
Subspace Method for Minimum Error Pattern Recognition
Hideyuki WATANABE Shigeru KATAGIRI

PAPER-Image Processing,Computer Graphics and Pattern Recognition

Vol:
E80-D No:12
Page(s):
1195-1204
In general cases of pattern recognition, a pattern to be recognized is first represented by a set of features and the measured values of the features are then classified. Finding features relevant to recognition is thus an important issue in recognizer design. As a fundamental design framework taht systematically enables one to realize such useful features, the Subspace Method (SM) has been extensively used in various recognition tasks. However, this promising methodological framework is still inadequate. The discriminative power of early versions was not very high. The training behavior of a recent discriminative version called the Learning Subspace Method has not been fully clarified due to its empirical definition, though its discriminative power has been improved. To alleviate this insufficiency, we propose in this paper a new discriminative SM algorithm based on the Minimum Classification Error/Generalized Probabilistic Descent method and show that the proposed algorithm achieves an optimal accurate recognition result, i.e., the (at least locally) minimum recognition error situation, in the probabilistic descent sense.
Discriminative Training Based on Minimum Classification Error for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning
Jun-ichi TAKAHASHI Shigeki SAGAYAMA

PAPER-Speech Processing and Acoustics

Vol:
E79-D No:12
Page(s):
1700-1707
This paper describes how to effectively use discriminative training based on Minimum Classification Error (MCE) criterion for a small amount of data in order to attain the highest level of recognition performance. This method is a combination of MCE training and Vector-Field-Smoothed Bayesian learning called MAP/VFS, which combines maximum a posteriori (MAP) estimation with Vector Field Smoothing (VFS). In the proposed method, MAP/VFS can significantly enhance MCE training in the robustness of acoustic modeling. In model training, MCE training is performed using the MAP/VFS-trained model as an initial model. The same data are used in both trainings. For speaker adaptation using several dozen training words, the proposed method has been experimentally proven to be very effective. For 50-word training data, recognition errors are drastically reduced by 47% compared with 16.5% when using only MCE. This high rate, in which 39% is due to MAP, an additional 4% is due to VFS, and a further improvement of 4% is due to MCE, can be attained by enhancing MCE training capability by MAP/VFS.
A Study on Speaker Adaptation for Mandarin Syllable Recognition with Minimum Error Discriminative Training
Chih-Heng LIN Chien-Hsing WU Pao-Chung CHANG

PAPER

Vol:
E78-D No:6
Page(s):
712-718
This paper investigates a different method of speaker adaptation for Mandarin syllable recognition. Based on the minimum classification error (MCE) criterion, we use the generalized probabilistic decent (GPD) algorithm to adjust interatively the parameters of the hidden Markov models (HMM). The experiments on the multi-speaker Mandarin syllable database of Telecommunication Laboratories (T.L.) yield the following results: 1) Efficient speaker adaptation can be achieved through discriminative training using the MCE criterion and the GPD algorithm. 2) The computations required can be reduced through the use of the confusion sets in Mandarin base syllables. 3) For the discriminative training, the adjustment on the mean values of the Gaussian mixtures has the most prominent effect on speaker adaptation. 4) The discriminative training approach can be used to enhance the speaker adaptation capability of the maximum a posteriori (MAP) approach.

21-32hit(32hit)

Keyword Search Result

[Keyword] discriminative(32hit)

A Framework of Real Time Hand Gesture Vision Based Human-Computer Interaction

Learning Speech Variability in Discriminative Acoustic Model Adaptation

Constraining a Generative Word Alignment Model with Discriminative Output

Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training

Discriminative Weight Training for Support Vector Machine-Based Speech/Music Classification in 3GPP2 SMV Codec

Automatic Language Identification with Discriminative Language Characterization Based on SVM

Minimum Bayes Risk Estimation and Decoding in Large Vocabulary Continuous Speech Recognition

Verification of Multi-Class Recognition Decision: A Classification Approach

VLSI Architecture and Implementation for Speech Recognizer Based on Discriminative Bayesian Neural Network

Subspace Method for Minimum Error Pattern Recognition

Discriminative Training Based on Minimum Classification Error for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning

A Study on Speaker Adaptation for Mandarin Syllable Recognition with Minimum Error Discriminative Training

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles