Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

Junichi YAMAGISHI; Koji ONISHI; Takashi MASUKO; Takao KOBAYASHI

doi:10.1093/ietisy/e88-d.3.502

IEICE TRANSACTIONS on Information

Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

Junichi YAMAGISHI, Koji ONISHI, Takashi MASUKO, Takao KOBAYASHI

Full Text Views

0

Cite this

Summary :

This paper describes the modeling of various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis. We show two methods for modeling speaking styles and emotional expressions. In the first method called style-dependent modeling, each speaking style and emotional expression is modeled individually. In the second one called style-mixed modeling, each speaking style and emotional expression is treated as one of contexts as well as phonetic, prosodic, and linguistic features, and all speaking styles and emotional expressions are modeled simultaneously by using a single acoustic model. We chose four styles of read speech -- neutral, rough, joyful, and sad -- and compared the above two modeling methods using these styles. The results of subjective evaluation tests show that both modeling methods have almost the same accuracy, and that it is possible to synthesize speech with the speaking style and emotional expression similar to those of the target speech. In a test of classification of styles in synthesized speech, more than 80% of speech samples generated using both the models were judged to be similar to the target styles. We also show that the style-mixed modeling method gives fewer output and duration distributions than the style-dependent modeling method.

Publication: IEICE TRANSACTIONS on Information Vol.E88-D No.3 pp.502-509

Publication Date: 2005/03/01

Publicized

Online ISSN

DOI: 10.1093/ietisy/e88-d.3.502

Type of Manuscript: Special Section PAPER (Special Section on Corpus-Based Speech Technologies)

Category: Speech Synthesis and Prosody

Cite this

Copy

Junichi YAMAGISHI, Koji ONISHI, Takashi MASUKO, Takao KOBAYASHI, "Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis" in IEICE TRANSACTIONS on Information, vol. E88-D, no. 3, pp. 502-509, March 2005, doi: 10.1093/ietisy/e88-d.3.502.
Abstract: This paper describes the modeling of various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis. We show two methods for modeling speaking styles and emotional expressions. In the first method called style-dependent modeling, each speaking style and emotional expression is modeled individually. In the second one called style-mixed modeling, each speaking style and emotional expression is treated as one of contexts as well as phonetic, prosodic, and linguistic features, and all speaking styles and emotional expressions are modeled simultaneously by using a single acoustic model. We chose four styles of read speech -- neutral, rough, joyful, and sad -- and compared the above two modeling methods using these styles. The results of subjective evaluation tests show that both modeling methods have almost the same accuracy, and that it is possible to synthesize speech with the speaking style and emotional expression similar to those of the target speech. In a test of classification of styles in synthesized speech, more than 80% of speech samples generated using both the models were judged to be similar to the target styles. We also show that the style-mixed modeling method gives fewer output and duration distributions than the style-dependent modeling method.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.3.502/_p

Copy

@ARTICLE{e88-d_3_502,
author={Junichi YAMAGISHI, Koji ONISHI, Takashi MASUKO, Takao KOBAYASHI, },
journal={IEICE TRANSACTIONS on Information},
title={Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis},
year={2005},
volume={E88-D},
number={3},
pages={502-509},
abstract={This paper describes the modeling of various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis. We show two methods for modeling speaking styles and emotional expressions. In the first method called style-dependent modeling, each speaking style and emotional expression is modeled individually. In the second one called style-mixed modeling, each speaking style and emotional expression is treated as one of contexts as well as phonetic, prosodic, and linguistic features, and all speaking styles and emotional expressions are modeled simultaneously by using a single acoustic model. We chose four styles of read speech -- neutral, rough, joyful, and sad -- and compared the above two modeling methods using these styles. The results of subjective evaluation tests show that both modeling methods have almost the same accuracy, and that it is possible to synthesize speech with the speaking style and emotional expression similar to those of the target speech. In a test of classification of styles in synthesized speech, more than 80% of speech samples generated using both the models were judged to be similar to the target styles. We also show that the style-mixed modeling method gives fewer output and duration distributions than the style-dependent modeling method.},
keywords={},
doi={10.1093/ietisy/e88-d.3.502},
ISSN={},
month={March},}

Copy

TY - JOUR
TI - Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis
T2 - IEICE TRANSACTIONS on Information
SP - 502
EP - 509
AU - Junichi YAMAGISHI
AU - Koji ONISHI
AU - Takashi MASUKO
AU - Takao KOBAYASHI
PY - 2005
DO - 10.1093/ietisy/e88-d.3.502
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2005
AB - This paper describes the modeling of various emotional expressions and speaking styles in synthetic speech using HMM-based speech synthesis. We show two methods for modeling speaking styles and emotional expressions. In the first method called style-dependent modeling, each speaking style and emotional expression is modeled individually. In the second one called style-mixed modeling, each speaking style and emotional expression is treated as one of contexts as well as phonetic, prosodic, and linguistic features, and all speaking styles and emotional expressions are modeled simultaneously by using a single acoustic model. We chose four styles of read speech -- neutral, rough, joyful, and sad -- and compared the above two modeling methods using these styles. The results of subjective evaluation tests show that both modeling methods have almost the same accuracy, and that it is possible to synthesize speech with the speaking style and emotional expression similar to those of the target speech. In a test of classification of styles in synthesized speech, more than 80% of speech samples generated using both the models were judged to be similar to the target styles. We also show that the style-mixed modeling method gives fewer output and duration distributions than the style-dependent modeling method.
ER -

IEICE TRANSACTIONS on Information

Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles