The search functionality is under construction.

Keyword Search Result

[Keyword] MLLR(10hit)

1-10hit
  • Smoothing Method for Improved Minimum Phone Error Linear Regression

    Yaohui QI  Fuping PAN  Fengpei GE  Qingwei ZHAO  Yonghong YAN  

     
    PAPER-Speech and Hearing

      Vol:
    E97-D No:8
      Page(s):
    2105-2113

    A smoothing method for minimum phone error linear regression (MPELR) is proposed in this paper. We show that the objective function for minimum phone error (MPE) can be combined with a prior mean distribution. When the prior mean distribution is based on maximum likelihood (ML) estimates, the proposed method is the same as the previous smoothing technique for MPELR. Instead of ML estimates, maximum a posteriori (MAP) parameter estimate is used to define the mode of prior mean distribution to improve the performance of MPELR. Experiments on a large vocabulary speech recognition task show that the proposed method can obtain 8.4% relative reduction in word error rate when the amount of data is limited, while retaining the same asymptotic performance as conventional MPELR. When compared with discriminative maximum a posteriori linear regression (DMAPLR), the proposed method shows improvement except for the case of limited adaptation data for supervised adaptation.

  • Variable Selection Linear Regression for Robust Speech Recognition

    Yu TSAO  Ting-Yao HU  Sakriani SAKTI  Satoshi NAKAMURA  Lin-shan LEE  

     
    PAPER-Speech Recognition

      Vol:
    E97-D No:6
      Page(s):
    1477-1487

    This study proposes a variable selection linear regression (VSLR) adaptation framework to improve the accuracy of automatic speech recognition (ASR) with only limited and unlabeled adaptation data. The proposed framework can be divided into three phases. The first phase prepares multiple variable subsets by applying a ranking filter to the original regression variable set. The second phase determines the best variable subset based on a pre-determined performance evaluation criterion and computes a linear regression (LR) mapping function based on the determined subset. The third phase performs adaptation in either model or feature spaces. The three phases can select the optimal components and remove redundancies in the LR mapping function effectively and thus enable VSLR to provide satisfactory adaptation performance even with a very limited number of adaptation statistics. We formulate model space VSLR and feature space VSLR by integrating the VS techniques into the conventional LR adaptation systems. Experimental results on the Aurora-4 task show that model space VSLR and feature space VSLR, respectively, outperform standard maximum likelihood linear regression (MLLR) and feature space MLLR (fMLLR) and their extensions, with notable word error rate (WER) reductions in a per-utterance unsupervised adaptation manner.

  • Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

    Arata ITOH  Sunao HARA  Norihide KITAOKA  Kazuya TAKEDA  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:10
      Page(s):
    2479-2485

    A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

  • Enhancing Eigenspace-Based MLLR Speaker Adaptation Using a Fuzzy Logic Learning Control Scheme

    Ing-Jr DING  

     
    PAPER

      Vol:
    E94-D No:10
      Page(s):
    1909-1916

    This study develops a fuzzy logic control mechanism in eigenspace-based MLLR speaker adaptation. Specifically, this mechanism can determine hidden Markov model parameters to enhance overall recognition performance despite ordinary or adverse conditions in both training and operating stages. The proposed mechanism regulates the influence of eigenspace-based MLLR adaptation given insufficient training data from a new speaker. This mechanism accounts for the amount of adaptation data available in transformation matrix parameter smoothing, and thus ensures the robustness of eigenspace-based MLLR adaptation against data scarcity. The proposed adaptive learning mechanism is computationally inexpensive. Experimental results show that eigenspace-based MLLR adaptation with fuzzy control outperforms conventional eigenspace-based MLLR, and especially when the adaptation data acquired from a new speaker is insufficient.

  • Regularized Maximum Likelihood Linear Regression Adaptation for Computer-Assisted Language Learning Systems

    Dean LUO  Yu QIAO  Nobuaki MINEMATSU  Keikichi HIROSE  

     
    PAPER-Educational Technology

      Vol:
    E94-D No:2
      Page(s):
    308-316

    This study focuses on speaker adaptation techniques for Computer-Assisted Language Learning (CALL). We first investigate the effects and problems of Maximum Likelihood Linear Regression (MLLR) speaker adaptation when used in pronunciation evaluation. Automatic scoring and error detection experiments are conducted on two publicly available databases of Japanese learners' English pronunciation. As we expected, over-adaptation causes misjudgment of pronunciation accuracy. Following the analysis, we propose a novel method, Regularized Maximum Likelihood Regression (Regularized-MLLR) adaptation, to solve the problem of the adverse effects of MLLR adaptation. This method uses a group of teachers' data to regularize learners' transformation matrices so that erroneous pronunciations will not be erroneously transformed as correct ones. We implement this idea in two ways: one is using the average of the teachers' transformation matrices as a constraint to MLLR, and the other is using linear combinations of the teachers' matrices to represent learners' transformations. Experimental results show that the proposed methods can better utilize MLLR adaptation and avoid over-adaptation.

  • Research on Intersession Variability Compensation for MLLR-SVM Speaker Recognition

    Shan ZHONG  Yuxiang SHAN  Liang HE  Jia LIU  

     
    PAPER-Speech/Audio

      Vol:
    E92-A No:8
      Page(s):
    1892-1897

    One of the most important challenges in speaker recognition is intersession variability (ISV), primarily cross-channel effects. Recent NIST speaker recognition evaluations (SRE) include a multilingual scenario with training conversations involving multilingual speakers collected in a number of other languages, leading to further performance decline. One important reason for this is that more and more researchers are using phonetic clustering to introduce high level information to improve speaker recognition. But such language dependent methods do not work well in multilingual conditions. In this paper, we study both language and channel mismatch using a support vector machine (SVM) speaker recognition system. Maximum likelihood linear regression (MLLR) transforms adapting a universal background model (UBM) are adopted as features. We first introduce a novel language independent statistical binary-decision tree to reduce multi-language effects, and compare this data-driven approach with a traditional knowledge based one. We also construct a framework for channel compensation using feature-domain latent factor analysis (LFA) and MLLR supervector kernel-based nuisance attribute projection (NAP) in the model-domain. Results on the NIST SRE 2006 1conv4w-1conv4w/mic corpus show significant improvement. We also compare our compensated MLLR-SVM system with state-of-the-art cepstral Gaussian mixture and SVM systems, and combine them for a further improvement.

  • Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training

    Junichi YAMAGISHI  Takao KOBAYASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:2
      Page(s):
    533-543

    In speaker adaptation for speech synthesis, it is desirable to convert both voice characteristics and prosodic features such as F0 and phone duration. For simultaneous adaptation of spectrum, F0 and phone duration within the HMM framework, we need to transform not only the state output distributions corresponding to spectrum and F0 but also the duration distributions corresponding to phone duration. However, it is not straightforward to adapt the state duration because the original HMM does not have explicit duration distributions. Therefore, we utilize the framework of the hidden semi-Markov model (HSMM), which is an HMM having explicit state duration distributions, and we apply an HSMM-based model adaptation algorithm to simultaneously transform both the state output and state duration distributions. Furthermore, we propose an HSMM-based adaptive training algorithm to simultaneously normalize the state output and state duration distributions of the average voice model. We incorporate these techniques into our HSMM-based speech synthesis system, and show their effectiveness from the results of subjective and objective evaluation tests.

  • Hybrid Voice Conversion of Unit Selection and Generation Using Prosody Dependent HMM

    Tadashi OKUBO  Ryo MOCHIZUKI  Tetsunori KOBAYASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E89-D No:11
      Page(s):
    2775-2782

    We propose a hybrid voice conversion method which employs a combination of techniques using HMM-based unit selection and spectrum generation. In the proposed method, the HMM-based unit selection selects the most likely unit for the required phoneme context from the target speaker's corpus when candidates of the target unit exist in the corpus. Unit selection is performed based on the sequence of the spectral probability distribution obtained from the adapted HMMs. On the other hand, when a target unit does not exist in a corpus, a target waveform is generated from the adapted HMM sequence by maximizing the spectral likelihood. The proposed method also employs the HMM in which the spectral probability distribution is adjusted to the target prosody using the weight defined by the prosodic probability of each distribution. To show the effectiveness of the proposed method, sound quality and speaker individuality tests were conducted. The results revealed that the proposed method could produce high-quality speech and individuality of the synthesized sound was more similar to the target speaker compared to conventional methods.

  • A Style Adaptation Technique for Speech Synthesis Using HSMM and Suprasegmental Features

    Makoto TACHIBANA  Junichi YAMAGISHI  Takashi MASUKO  Takao KOBAYASHI  

     
    PAPER-Speech Synthesis

      Vol:
    E89-D No:3
      Page(s):
    1092-1099

    This paper proposes a technique for synthesizing speech with a desired speaking style and/or emotional expression, based on model adaptation in an HMM-based speech synthesis framework. Speaking styles and emotional expressions are characterized by many segmental and suprasegmental features in both spectral and prosodic features. Therefore, it is essential to take account of these features in the model adaptation. The proposed technique called style adaptation, deals with this issue. Firstly, the maximum likelihood linear regression (MLLR) algorithm, based on a framework of hidden semi-Markov model (HSMM) is presented to provide a mathematically rigorous and robust adaptation of state duration and to adapt both the spectral and prosodic features. Then, a novel tying method for the regression matrices of the MLLR algorithm is also presented to allow the incorporation of both the segmental and suprasegmental speech features into the style adaptation. The proposed tying method uses regression class trees with contextual information. From the results of several subjective tests, we show that these techniques can perform style adaptation while maintaining naturalness of the synthetic speech.

  • Continuous Speech Recognition Using an On-Line Speaker Adaptation Method Based on Automatic Speaker Clustering

    Wei ZHANG  Seiichi NAKAGAWA  

     
    PAPER-Speech and Speaker Recognition

      Vol:
    E86-D No:3
      Page(s):
    464-473

    This paper evaluates an on-line incremental speaker adaptation method for co-channel conversation including multiple speakers with the assumption that the speaker is unknown and changes frequently. After performing the speaker clustering treatment based on the Vector Quantization (VQ) distortion for every utterance, acoustic models for each cluster are adapted by Maximum Likelihood Linear Regression (MLLR) or Maximum A Posteriori probability (MAP). The performance of continuous speech recognition could be improved. In this paper, to prove the efficiency of the speaker clustering method for improving the performance of continuous speech recognition, the continuous speech recognition experiments with supervised and unsupervised cluster adaptation were conducted, respectively. Finally, evaluation experiments based on other prepared test data were performed on continuous syllable recognition and large vocabulary continuous speech recognition (LVCSR). The efficiency of the speaker adaptation and clustering methods presented in this paper was supported strongly by the experimental results.