A VoiceFont Creation Framework for Generating Personalized Voices

Takashi SAITO; Masaharu SAKAMOTO

doi:10.1093/ietisy/e88-d.3.525

IEICE TRANSACTIONS on Information

A VoiceFont Creation Framework for Generating Personalized Voices

Takashi SAITO, Masaharu SAKAMOTO

Full Text Views

0

Cite this

Summary :

This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.

Publication: IEICE TRANSACTIONS on Information Vol.E88-D No.3 pp.525-534

Publication Date: 2005/03/01

Publicized

Online ISSN

DOI: 10.1093/ietisy/e88-d.3.525

Type of Manuscript: Special Section PAPER (Special Section on Corpus-Based Speech Technologies)

Category: Speech Synthesis and Prosody

Cite this

Copy

Takashi SAITO, Masaharu SAKAMOTO, "A VoiceFont Creation Framework for Generating Personalized Voices" in IEICE TRANSACTIONS on Information, vol. E88-D, no. 3, pp. 525-534, March 2005, doi: 10.1093/ietisy/e88-d.3.525.
Abstract: This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.3.525/_p

Copy

@ARTICLE{e88-d_3_525,
author={Takashi SAITO, Masaharu SAKAMOTO, },
journal={IEICE TRANSACTIONS on Information},
title={A VoiceFont Creation Framework for Generating Personalized Voices},
year={2005},
volume={E88-D},
number={3},
pages={525-534},
abstract={This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.},
keywords={},
doi={10.1093/ietisy/e88-d.3.525},
ISSN={},
month={March},}

Copy

TY - JOUR
TI - A VoiceFont Creation Framework for Generating Personalized Voices
T2 - IEICE TRANSACTIONS on Information
SP - 525
EP - 534
AU - Takashi SAITO
AU - Masaharu SAKAMOTO
PY - 2005
DO - 10.1093/ietisy/e88-d.3.525
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2005
AB - This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.
ER -

IEICE TRANSACTIONS on Information

A VoiceFont Creation Framework for Generating Personalized Voices

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

A VoiceFont Creation Framework for Generating Personalized Voices

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles