The search functionality is under construction.
The search functionality is under construction.

Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training

Junichi YAMAGISHI, Takao KOBAYASHI

  • Full Text Views

    0

  • Cite this

Summary :

In speaker adaptation for speech synthesis, it is desirable to convert both voice characteristics and prosodic features such as F0 and phone duration. For simultaneous adaptation of spectrum, F0 and phone duration within the HMM framework, we need to transform not only the state output distributions corresponding to spectrum and F0 but also the duration distributions corresponding to phone duration. However, it is not straightforward to adapt the state duration because the original HMM does not have explicit duration distributions. Therefore, we utilize the framework of the hidden semi-Markov model (HSMM), which is an HMM having explicit state duration distributions, and we apply an HSMM-based model adaptation algorithm to simultaneously transform both the state output and state duration distributions. Furthermore, we propose an HSMM-based adaptive training algorithm to simultaneously normalize the state output and state duration distributions of the average voice model. We incorporate these techniques into our HSMM-based speech synthesis system, and show their effectiveness from the results of subjective and objective evaluation tests.

Publication
IEICE TRANSACTIONS on Information Vol.E90-D No.2 pp.533-543
Publication Date
2007/02/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e90-d.2.533
Type of Manuscript
PAPER
Category
Speech and Hearing

Authors

Keyword