Automatic prosody labeling is the task of automatically annotating prosodic labels such as syllable stresses or break indices into speech corpora. Prosody-labeled corpora are important for speech synthesis and automatic speech understanding. However, the subtleness of physical features makes accurate labeling difficult. Since errors in the prosodic labels can lead to incorrect prosody estimation and unnatural synthetic sound, the accuracy of the labels is a key factor for text-to-speech (TTS) systems. In particular, mora accent labels relevant to pitch are very important for Japanese, since Japanese is a pitch-accent language and Japanese people have a particularly keen sense of pitch accents. However, the determination of the mora accents of Japanese is a more difficult task than English stress detection in a way. This is because the context of words changes the mora accents within the word, which is different from English stress where the stress is normally put at the lexical primary stress of a word. In this paper, we propose a method that can accurately determine the prosodic labels of Japanese using both acoustic and linguistic models. A speaker-independent linguistic model provides mora-level knowledge about the possible correct accentuations in Japanese, and contributes to reduction of the required size of the speaker-dependent speech corpus for training the other stochastic models. Our experiments show the effectiveness of the combination of models.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Ryuki TACHIBANA, Tohru NAGANO, Gakuto KURATA, Masafumi NISHIMURA, Noboru BABAGUCHI, "Automatic Prosody Labeling Using Multiple Models for Japanese" in IEICE TRANSACTIONS on Information,
vol. E90-D, no. 11, pp. 1805-1812, November 2007, doi: 10.1093/ietisy/e90-d.11.1805.
Abstract: Automatic prosody labeling is the task of automatically annotating prosodic labels such as syllable stresses or break indices into speech corpora. Prosody-labeled corpora are important for speech synthesis and automatic speech understanding. However, the subtleness of physical features makes accurate labeling difficult. Since errors in the prosodic labels can lead to incorrect prosody estimation and unnatural synthetic sound, the accuracy of the labels is a key factor for text-to-speech (TTS) systems. In particular, mora accent labels relevant to pitch are very important for Japanese, since Japanese is a pitch-accent language and Japanese people have a particularly keen sense of pitch accents. However, the determination of the mora accents of Japanese is a more difficult task than English stress detection in a way. This is because the context of words changes the mora accents within the word, which is different from English stress where the stress is normally put at the lexical primary stress of a word. In this paper, we propose a method that can accurately determine the prosodic labels of Japanese using both acoustic and linguistic models. A speaker-independent linguistic model provides mora-level knowledge about the possible correct accentuations in Japanese, and contributes to reduction of the required size of the speaker-dependent speech corpus for training the other stochastic models. Our experiments show the effectiveness of the combination of models.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e90-d.11.1805/_p
Copy
@ARTICLE{e90-d_11_1805,
author={Ryuki TACHIBANA, Tohru NAGANO, Gakuto KURATA, Masafumi NISHIMURA, Noboru BABAGUCHI, },
journal={IEICE TRANSACTIONS on Information},
title={Automatic Prosody Labeling Using Multiple Models for Japanese},
year={2007},
volume={E90-D},
number={11},
pages={1805-1812},
abstract={Automatic prosody labeling is the task of automatically annotating prosodic labels such as syllable stresses or break indices into speech corpora. Prosody-labeled corpora are important for speech synthesis and automatic speech understanding. However, the subtleness of physical features makes accurate labeling difficult. Since errors in the prosodic labels can lead to incorrect prosody estimation and unnatural synthetic sound, the accuracy of the labels is a key factor for text-to-speech (TTS) systems. In particular, mora accent labels relevant to pitch are very important for Japanese, since Japanese is a pitch-accent language and Japanese people have a particularly keen sense of pitch accents. However, the determination of the mora accents of Japanese is a more difficult task than English stress detection in a way. This is because the context of words changes the mora accents within the word, which is different from English stress where the stress is normally put at the lexical primary stress of a word. In this paper, we propose a method that can accurately determine the prosodic labels of Japanese using both acoustic and linguistic models. A speaker-independent linguistic model provides mora-level knowledge about the possible correct accentuations in Japanese, and contributes to reduction of the required size of the speaker-dependent speech corpus for training the other stochastic models. Our experiments show the effectiveness of the combination of models.},
keywords={},
doi={10.1093/ietisy/e90-d.11.1805},
ISSN={1745-1361},
month={November},}
Copy
TY - JOUR
TI - Automatic Prosody Labeling Using Multiple Models for Japanese
T2 - IEICE TRANSACTIONS on Information
SP - 1805
EP - 1812
AU - Ryuki TACHIBANA
AU - Tohru NAGANO
AU - Gakuto KURATA
AU - Masafumi NISHIMURA
AU - Noboru BABAGUCHI
PY - 2007
DO - 10.1093/ietisy/e90-d.11.1805
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2007
AB - Automatic prosody labeling is the task of automatically annotating prosodic labels such as syllable stresses or break indices into speech corpora. Prosody-labeled corpora are important for speech synthesis and automatic speech understanding. However, the subtleness of physical features makes accurate labeling difficult. Since errors in the prosodic labels can lead to incorrect prosody estimation and unnatural synthetic sound, the accuracy of the labels is a key factor for text-to-speech (TTS) systems. In particular, mora accent labels relevant to pitch are very important for Japanese, since Japanese is a pitch-accent language and Japanese people have a particularly keen sense of pitch accents. However, the determination of the mora accents of Japanese is a more difficult task than English stress detection in a way. This is because the context of words changes the mora accents within the word, which is different from English stress where the stress is normally put at the lexical primary stress of a word. In this paper, we propose a method that can accurately determine the prosodic labels of Japanese using both acoustic and linguistic models. A speaker-independent linguistic model provides mora-level knowledge about the possible correct accentuations in Japanese, and contributes to reduction of the required size of the speaker-dependent speech corpus for training the other stochastic models. Our experiments show the effectiveness of the combination of models.
ER -