Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries. However, an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally, unit-selection is conducted using multiple prosodic targets. The experimental results show that the proposed method improves the naturalness of synthesized speech.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Deok-Su NA, Myung-Jin BAE, "A Variable Break Prediction Method Using CART in a Japanese Text-to-Speech System" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 2, pp. 349-352, February 2009, doi: 10.1587/transinf.E92.D.349.
Abstract: Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries. However, an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally, unit-selection is conducted using multiple prosodic targets. The experimental results show that the proposed method improves the naturalness of synthesized speech.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.349/_p
Copy
@ARTICLE{e92-d_2_349,
author={Deok-Su NA, Myung-Jin BAE, },
journal={IEICE TRANSACTIONS on Information},
title={A Variable Break Prediction Method Using CART in a Japanese Text-to-Speech System},
year={2009},
volume={E92-D},
number={2},
pages={349-352},
abstract={Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries. However, an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally, unit-selection is conducted using multiple prosodic targets. The experimental results show that the proposed method improves the naturalness of synthesized speech.},
keywords={},
doi={10.1587/transinf.E92.D.349},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - A Variable Break Prediction Method Using CART in a Japanese Text-to-Speech System
T2 - IEICE TRANSACTIONS on Information
SP - 349
EP - 352
AU - Deok-Su NA
AU - Myung-Jin BAE
PY - 2009
DO - 10.1587/transinf.E92.D.349
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2009
AB - Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries. However, an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally, unit-selection is conducted using multiple prosodic targets. The experimental results show that the proposed method improves the naturalness of synthesized speech.
ER -