The search functionality is under construction.

IEICE TRANSACTIONS on Information

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models

Shinnosuke TAKAMICHI, Tomoki TODA, Graham NEUBIG, Sakriani SAKTI, Satoshi NAKAMURA

  • Full Text Views

    0

  • Cite this

Summary :

This paper presents a novel statistical sample-based approach for Gaussian Mixture Model (GMM)-based Voice Conversion (VC). Although GMM-based VC has the promising flexibility of model adaptation, quality in converted speech is significantly worse than that of natural speech. This paper addresses the problem of inaccurate modeling, which is one of the main reasons causing the quality degradation. Recently, we have proposed statistical sample-based speech synthesis using rich context models for high-quality and flexible Hidden Markov Model (HMM)-based Text-To-Speech (TTS) synthesis. This method makes it possible not only to produce high-quality speech by introducing ideas from unit selection synthesis, but also to preserve flexibility of the original HMM-based TTS. In this paper, we apply this idea to GMM-based VC. The rich context models are first trained for individual joint speech feature vectors, and then we gather them mixture by mixture to form a Rich context-GMM (R-GMM). In conversion, an iterative generation algorithm using R-GMMs is used to convert speech parameters, after initialization using over-trained probability distributions. Because the proposed method utilizes individual speech features, and its formulation is the same as that of conventional GMM-based VC, it makes it possible to produce high-quality speech while keeping flexibility of the original GMM-based VC. The experimental results demonstrate that the proposed method yields significant improvements in term of speech quality and speaker individuality in converted speech.

Publication
IEICE TRANSACTIONS on Information Vol.E99-D No.10 pp.2490-2498
Publication Date
2016/10/01
Publicized
2016/07/19
Online ISSN
1745-1361
DOI
10.1587/transinf.2016SLP0020
Type of Manuscript
Special Section PAPER (Special Section on Recent Advances in Machine Learning for Spoken Language Processing)
Category
Voice conversion

Authors

Shinnosuke TAKAMICHI
  Nara Institute of Science and Technology
Tomoki TODA
  Nagoya University
Graham NEUBIG
  Nara Institute of Science and Technology
Sakriani SAKTI
  Nara Institute of Science and Technology
Satoshi NAKAMURA
  Nara Institute of Science and Technology

Keyword