The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Masami AKAMINE(3hit)

1-3hit
  • Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model

    Yamato OHTANI  Masatsune TAMURA  Masahiro MORITA  Masami AKAMINE  

     
    PAPER-Voice conversion

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2481-2489

    This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.

  • Adaptive Density Pulse Excitation for Low Bit Rate Speech Coding

    Masami AKAMINE  Kimio MISEKI  

     
    PAPER-Digital Signal Processing

      Vol:
    E78-A No:2
      Page(s):
    199-207

    An excitation signal for a synthesis filter plays an important role in producing high quality speech at a low bit rate. This paper presents a new efficient excitation model, Adaptive Density Pulse (ADP) , for low bit-rate speech coding. This ADP is a pulse train whose density (spacing interval) is constant within a subframe but can be varied subframe by subframe. First, the ADP excitation signal is defined. A procedure for finding the optimal ADP excitation is presented. Some results on investigating the effects of the ADP parameters on the synthesized speech quality are discussed. ADP excitation is introduced to the CELP (Code Excited Linear Prediction) coding method to improve speech quality at bit rates around 4 kbps. A CELP coder with an ADP (ADP-CELP) is described. ADP excitation makes it possible for the CELP coder to follow transient portions of speech signals. Also ADP excitation can reduce computational complexity in selecting the best excitation from a codebook, which has been the primary drawback of CELP. The number of multiplications can be reduced to the order of 1/D2 by utilizing the sparseness of ADP excitation, where D is the pulse interval. The authors evaluated the speech quality of a 4 kbps ADP-CELP coder by computer simulation. ADP excitation improved the performance of conventional CELP in segmental SNR.

  • Decision Tree-Based Acoustic Models for Speech Recognition with Improved Smoothness

    Masami AKAMINE  Jitendra AJMERA  

     
    PAPER-Speech and Hearing

      Vol:
    E94-D No:11
      Page(s):
    2250-2258

    This paper proposes likelihood smoothing techniques to improve decision tree-based acoustic models, where decision trees are used as replacements for Gaussian mixture models to compute the observation likelihoods for a given HMM state in a speech recognition system. Decision trees have a number of advantageous properties, such as not imposing restrictions on the number or types of features, and automatically performing feature selection. This paper describes basic configurations of decision tree-based acoustic models and proposes two methods to improve the robustness of the basic model: DT mixture models and soft decisions for continuous features. Experimental results for the Aurora 2 speech database show that a system using decision trees offers state-of-the-art performance, even without taking advantage of its full potential and soft decisions improve the performance of DT-based acoustic models with 16.8% relative error rate reduction over hard decisions.