Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis

Shinsuke SAKAI; Tatsuya KAWAHARA; Hisashi KAWAI

doi:10.1587/transinf.E94.D.2006

IEICE TRANSACTIONS on Information

Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis

Shinsuke SAKAI, Tatsuya KAWAHARA, Hisashi KAWAI

Full Text Views

0

Cite this

Summary :

The measure of the goodness, or inversely the cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is measured by the conditional probability of observing the spectral shape of the current candidate unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. Decision tree-based parameter tying is performed to achieve robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented for a corpus-based speech synthesizer, and the effectiveness of the proposed method was confirmed by an objective evaluation as well as a subjective listening test. We also demonstrate that the proposed method generalizes some popular conventional methods in that those methods can be derived as the special cases of the proposed method.

Publication: IEICE TRANSACTIONS on Information Vol.E94-D No.10 pp.2006-2014

Publication Date: 2011/10/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E94.D.2006

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Shinsuke SAKAI, Tatsuya KAWAHARA, Hisashi KAWAI, "Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis" in IEICE TRANSACTIONS on Information, vol. E94-D, no. 10, pp. 2006-2014, October 2011, doi: 10.1587/transinf.E94.D.2006.
Abstract: The measure of the goodness, or inversely the cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is measured by the conditional probability of observing the spectral shape of the current candidate unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. Decision tree-based parameter tying is performed to achieve robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented for a corpus-based speech synthesizer, and the effectiveness of the proposed method was confirmed by an objective evaluation as well as a subjective listening test. We also demonstrate that the proposed method generalizes some popular conventional methods in that those methods can be derived as the special cases of the proposed method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E94.D.2006/_p

Copy

@ARTICLE{e94-d_10_2006,
author={Shinsuke SAKAI, Tatsuya KAWAHARA, Hisashi KAWAI, },
journal={IEICE TRANSACTIONS on Information},
title={Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis},
year={2011},
volume={E94-D},
number={10},
pages={2006-2014},
abstract={The measure of the goodness, or inversely the cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is measured by the conditional probability of observing the spectral shape of the current candidate unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. Decision tree-based parameter tying is performed to achieve robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented for a corpus-based speech synthesizer, and the effectiveness of the proposed method was confirmed by an objective evaluation as well as a subjective listening test. We also demonstrate that the proposed method generalizes some popular conventional methods in that those methods can be derived as the special cases of the proposed method.},
keywords={},
doi={10.1587/transinf.E94.D.2006},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis
T2 - IEICE TRANSACTIONS on Information
SP - 2006
EP - 2014
AU - Shinsuke SAKAI
AU - Tatsuya KAWAHARA
AU - Hisashi KAWAI
PY - 2011
DO - 10.1587/transinf.E94.D.2006
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E94-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2011
AB - The measure of the goodness, or inversely the cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is measured by the conditional probability of observing the spectral shape of the current candidate unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. Decision tree-based parameter tying is performed to achieve robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented for a corpus-based speech synthesizer, and the effectiveness of the proposed method was confirmed by an objective evaluation as well as a subjective listening test. We also demonstrate that the proposed method generalizes some popular conventional methods in that those methods can be derived as the special cases of the proposed method.
ER -

IEICE TRANSACTIONS on Information

Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles