The search functionality is under construction.

Author Search Result

[Author] Makoto SAKAI(4hit)

1-4hit
  • Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training

    Makoto SAKAI  Norihide KITAOKA  Yuya HATTORI  Seiichi NAKAGAWA  Kazuya TAKEDA  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:2
      Page(s):
    395-398

    To improve speech recognition performance, acoustic feature transformation based on discriminant analysis has been widely used. For the same purpose, discriminative training of HMMs has also been used. In this letter we investigate the effectiveness of these two techniques and their combination. We also investigate the robustness of matched and mismatched noise conditions between training and evaluation environments.

  • Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition

    Makoto SAKAI  Norihide KITAOKA  Seiichi NAKAGAWA  

     
    PAPER-Feature Extraction

      Vol:
    E91-D No:3
      Page(s):
    478-487

    To precisely model the time dependency of features is one of the important issues for speech recognition. Segmental unit input HMM with a dimensionality reduction method has been widely used to address this issue. Linear discriminant analysis (LDA) and heteroscedastic extensions, e.g., heteroscedastic linear discriminant analysis (HLDA) or heteroscedastic discriminant analysis (HDA), are popular approaches to reduce dimensionality. However, it is difficult to find one particular criterion suitable for any kind of data set in carrying out dimensionality reduction while preserving discriminative information. In this paper, we propose a new framework which we call power linear discriminant analysis (PLDA). PLDA can be used to describe various criteria including LDA, HLDA, and HDA with one control parameter. In addition, we provide an efficient selection method using a control parameter without training HMMs nor testing recognition performance on a development data set. Experimental results show that the PLDA is more effective than conventional methods for various data sets.

  • Acoustic Feature Transformation Combining Average and Maximum Classification Error Minimization Criteria

    Makoto SAKAI  Norihide KITAOKA  Kazuya TAKEDA  

     
    LETTER-Speech and Hearing

      Vol:
    E93-D No:7
      Page(s):
    2005-2008

    Acoustic feature transformation is widely used to reduce dimensionality and improve speech recognition performance. In this letter we focus on dimensionality reduction methods that minimize the average classification error. Unfortunately, minimization of the average classification error may cause considerable overlaps between distributions of some classes. To mitigate risks of considerable overlaps, we propose a dimensionality reduction method that minimizes the maximum classification error. We also propose two interpolated methods that can describe the average and maximum classification errors. Experimental results show that these proposed methods improve speech recognition performance.

  • Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition

    Makoto SAKAI  Norihide KITAOKA  Kazuya TAKEDA  

     
    PAPER-Speech and Hearing

      Vol:
    E93-D No:5
      Page(s):
    1244-1252

    To improve speech recognition performance, feature transformation based on discriminant analysis has been widely used to reduce the redundant dimensions of acoustic features. Linear discriminant analysis (LDA) and heteroscedastic discriminant analysis (HDA) are often used for this purpose, and a generalization method for LDA and HDA, called power LDA (PLDA), has been proposed. However, these methods may result in an unexpected dimensionality reduction for multimodal data. It is important to preserve the local structure of the data when reducing the dimensionality of multimodal data. In this paper we introduce two methods, locality-preserving HDA and locality-preserving PLDA, to reduce dimensionality of multimodal data appropriately. We also propose an approximate calculation scheme to calculate sub-optimal projections rapidly. Experimental results show that the locality-preserving methods yield better performance than the traditional ones in speech recognition.