The search functionality is under construction.

IEICE TRANSACTIONS on Information

A VoiceFont Creation Framework for Generating Personalized Voices

Takashi SAITO, Masaharu SAKAMOTO

  • Full Text Views

    0

  • Cite this

Summary :

This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.

Publication
IEICE TRANSACTIONS on Information Vol.E88-D No.3 pp.525-534
Publication Date
2005/03/01
Publicized
Online ISSN
DOI
10.1093/ietisy/e88-d.3.525
Type of Manuscript
Special Section PAPER (Special Section on Corpus-Based Speech Technologies)
Category
Speech Synthesis and Prosody

Authors

Keyword