Full Text Views
96
When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper describes a statistical method for automatically predicting the accent nucleus changes due to accent sandhi. First, as the basis of the research, a database of Japanese text was constructed with labels of accent phrase boundaries and accent nucleus positions when uttered in sentences. A single native speaker of Tokyo dialect Japanese annotated all the labels for 6,344 Japanese sentences. Then, using this database, a conditional-random-field-based method was developed using this database to predict accent phrase boundaries and accent nuclei. The proposed method predicted accent nucleus positions for accent phrases with 94.66% accuracy, clearly surpassing the 87.48% accuracy obtained using our rule-based method. A listening experiment was also conducted on synthetic speech obtained using the proposed method and that obtained using the rule-based method. The results show that our method significantly improved the naturalness of synthetic speech.
Masayuki SUZUKI
The University of Tokyo
Ryo KUROIWA
The University of Tokyo
Keisuke INNAMI
The University of Tokyo
Shumpei KOBAYASHI
The University of Tokyo
Shinya SHIMIZU
The University of Tokyo
Nobuaki MINEMATSU
The University of Tokyo
Keikichi HIROSE
The University of Tokyo
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Masayuki SUZUKI, Ryo KUROIWA, Keisuke INNAMI, Shumpei KOBAYASHI, Shinya SHIMIZU, Nobuaki MINEMATSU, Keikichi HIROSE, "Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields" in IEICE TRANSACTIONS on Information,
vol. E100-D, no. 4, pp. 655-661, April 2017, doi: 10.1587/transinf.2016AWI0004.
Abstract: When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper describes a statistical method for automatically predicting the accent nucleus changes due to accent sandhi. First, as the basis of the research, a database of Japanese text was constructed with labels of accent phrase boundaries and accent nucleus positions when uttered in sentences. A single native speaker of Tokyo dialect Japanese annotated all the labels for 6,344 Japanese sentences. Then, using this database, a conditional-random-field-based method was developed using this database to predict accent phrase boundaries and accent nuclei. The proposed method predicted accent nucleus positions for accent phrases with 94.66% accuracy, clearly surpassing the 87.48% accuracy obtained using our rule-based method. A listening experiment was also conducted on synthetic speech obtained using the proposed method and that obtained using the rule-based method. The results show that our method significantly improved the naturalness of synthetic speech.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016AWI0004/_p
Copy
@ARTICLE{e100-d_4_655,
author={Masayuki SUZUKI, Ryo KUROIWA, Keisuke INNAMI, Shumpei KOBAYASHI, Shinya SHIMIZU, Nobuaki MINEMATSU, Keikichi HIROSE, },
journal={IEICE TRANSACTIONS on Information},
title={Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields},
year={2017},
volume={E100-D},
number={4},
pages={655-661},
abstract={When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper describes a statistical method for automatically predicting the accent nucleus changes due to accent sandhi. First, as the basis of the research, a database of Japanese text was constructed with labels of accent phrase boundaries and accent nucleus positions when uttered in sentences. A single native speaker of Tokyo dialect Japanese annotated all the labels for 6,344 Japanese sentences. Then, using this database, a conditional-random-field-based method was developed using this database to predict accent phrase boundaries and accent nuclei. The proposed method predicted accent nucleus positions for accent phrases with 94.66% accuracy, clearly surpassing the 87.48% accuracy obtained using our rule-based method. A listening experiment was also conducted on synthetic speech obtained using the proposed method and that obtained using the rule-based method. The results show that our method significantly improved the naturalness of synthetic speech.},
keywords={},
doi={10.1587/transinf.2016AWI0004},
ISSN={1745-1361},
month={April},}
Copy
TY - JOUR
TI - Accent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields
T2 - IEICE TRANSACTIONS on Information
SP - 655
EP - 661
AU - Masayuki SUZUKI
AU - Ryo KUROIWA
AU - Keisuke INNAMI
AU - Shumpei KOBAYASHI
AU - Shinya SHIMIZU
AU - Nobuaki MINEMATSU
AU - Keikichi HIROSE
PY - 2017
DO - 10.1587/transinf.2016AWI0004
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2017
AB - When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper describes a statistical method for automatically predicting the accent nucleus changes due to accent sandhi. First, as the basis of the research, a database of Japanese text was constructed with labels of accent phrase boundaries and accent nucleus positions when uttered in sentences. A single native speaker of Tokyo dialect Japanese annotated all the labels for 6,344 Japanese sentences. Then, using this database, a conditional-random-field-based method was developed using this database to predict accent phrase boundaries and accent nuclei. The proposed method predicted accent nucleus positions for accent phrases with 94.66% accuracy, clearly surpassing the 87.48% accuracy obtained using our rule-based method. A listening experiment was also conducted on synthetic speech obtained using the proposed method and that obtained using the rule-based method. The results show that our method significantly improved the naturalness of synthetic speech.
ER -