IEICE global.ieice.org Site

Author Search Result

[Author] Shinsuke SAKAI(3hit)

1-3hit

Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis
Shinsuke SAKAI Tatsuya KAWAHARA Hisashi KAWAI

PAPER-Speech and Hearing

Vol:
E94-D No:10
Page(s):
2006-2014
The measure of the goodness, or inversely the cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is measured by the conditional probability of observing the spectral shape of the current candidate unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. Decision tree-based parameter tying is performed to achieve robust training that balances between model complexity and the amount of training data available. The concatenation models are implemented for a corpus-based speech synthesizer, and the effectiveness of the proposed method was confirmed by an objective evaluation as well as a subjective listening test. We also demonstrate that the proposed method generalizes some popular conventional methods in that those methods can be derived as the special cases of the proposed method.
Fundamental Frequency Modeling for Speech Synthesis Based on a Statistical Learning Technique
Shinsuke SAKAI

PAPER-Speech Synthesis and Prosody

Vol:
E88-D No:3
Page(s):
489-495
This paper proposes a novel multi-layer approach to fundamental frequency modeling for concatenative speech synthesis based on a statistical learning technique called additive models. We define an additive F0 contour model consisting of long-term, intonational phrase-level, component and short-term, accentual phrase-level, component, along with a least-squares error criterion that includes a regularization term. A backfitting algorithm, that is derived from this error criterion, estimates both components simultaneously by iteratively applying cubic spline smoothers. When this method is applied to a 7,000 utterance Japanese speech corpus, it achieves F0 RMS errors of 28.9 and 29.8 Hz on the training and test data, respectively, with corresponding correlation coefficients of 0.806 and 0.777. The automatically determined intonational and accentual phrase components turn out to behave smoothly, systematically, and intuitively under a variety of prosodic conditions.
Admissible Stopping in Viterbi Beam Search for Unit Selection Speech Synthesis
Shinsuke SAKAI Tatsuya KAWAHARA

PAPER-Speech and Hearing

Vol:
E96-D No:6
Page(s):
1359-1367
Corpus-based concatenative speech synthesis has been widely investigated and deployed in recent years since it provides a highly natural synthesized speech quality. The amount of computation required in the run time, however, can often be quite large. In this paper, we propose early stopping schemes for Viterbi beam search in the unit selection, with which we can stop early in the local Viterbi minimization for each unit as well as in the exploration of candidate units for a given target. It takes advantage of the fact that the space of the acoustic parameters of the database units is fixed and certain lower bounds of the concatenation costs can be precomputed. The proposed method for early stopping is admissible in that it does not change the result of the Viterbi beam search. Experiments using probability-based concatenation costs as well as distance-based costs show that the proposed methods of admissible stopping effectively reduce the amount of computation required in the Viterbi beam search while keeping its result unchanged. Furthermore, the reduction effect of computation turned out to be much larger if the available lower bound for concatenation costs is tighter.

Author Search Result

[Author] Shinsuke SAKAI(3hit)

Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis

Fundamental Frequency Modeling for Speech Synthesis Based on a Statistical Learning Technique

Admissible Stopping in Viterbi Beam Search for Unit Selection Speech Synthesis

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles