Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis

Daiki SEKIZAWA; Shinnosuke TAKAMICHI; Hiroshi SARUWATARI

doi:10.1587/transinf.2018EDL8264

IEICE TRANSACTIONS on Information

Open Access
Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis

Daiki SEKIZAWA, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI

Full Text Views

47

Cite this

Free PDF (507.5KB)

Summary :

This article proposes a prosody correction method based on partial model adaptation for Chinese-accented Japanese hidden Markov model (HMM)-based text-to-speech synthesis. Although text-to-speech synthesis built from non-native speech accurately reproduces the speaker's individuality in synthetic speech, the naturalness of the synthetic speech is strongly degraded. In the proposed model, to improve the naturalness while preserving the speaker individuality of Chinese-accented Japanese text-to-speech synthesis, we partially utilize HMM parameters of native Japanese speech to synthesize prosody-corrected synthetic speech. Results of an experimental evaluation demonstrate that duration and F₀ correction are significantly effective for improving naturalness.

Publication: IEICE TRANSACTIONS on Information Vol.E102-D No.6 pp.1218-1221

Publication Date: 2019/06/01

Publicized: 2019/03/11

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2018EDL8264

Type of Manuscript: LETTER

Category: Speech and Hearing

Authors

Daiki SEKIZAWA
  University of Tokyo
Shinnosuke TAKAMICHI
  University of Tokyo
Hiroshi SARUWATARI
  University of Tokyo

Keyword

HMM-based text-to-speech synthesis, non-native speech, Chinese-accented Japanese, prosody

Cite this

Copy

Daiki SEKIZAWA, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI, "Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis" in IEICE TRANSACTIONS on Information, vol. E102-D, no. 6, pp. 1218-1221, June 2019, doi: 10.1587/transinf.2018EDL8264.
Abstract: This article proposes a prosody correction method based on partial model adaptation for Chinese-accented Japanese hidden Markov model (HMM)-based text-to-speech synthesis. Although text-to-speech synthesis built from non-native speech accurately reproduces the speaker's individuality in synthetic speech, the naturalness of the synthetic speech is strongly degraded. In the proposed model, to improve the naturalness while preserving the speaker individuality of Chinese-accented Japanese text-to-speech synthesis, we partially utilize HMM parameters of native Japanese speech to synthesize prosody-corrected synthetic speech. Results of an experimental evaluation demonstrate that duration and F₀ correction are significantly effective for improving naturalness.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDL8264/_p

Copy

@ARTICLE{e102-d_6_1218,
author={Daiki SEKIZAWA, Shinnosuke TAKAMICHI, Hiroshi SARUWATARI, },
journal={IEICE TRANSACTIONS on Information},
title={Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis},
year={2019},
volume={E102-D},
number={6},
pages={1218-1221},
abstract={This article proposes a prosody correction method based on partial model adaptation for Chinese-accented Japanese hidden Markov model (HMM)-based text-to-speech synthesis. Although text-to-speech synthesis built from non-native speech accurately reproduces the speaker's individuality in synthetic speech, the naturalness of the synthetic speech is strongly degraded. In the proposed model, to improve the naturalness while preserving the speaker individuality of Chinese-accented Japanese text-to-speech synthesis, we partially utilize HMM parameters of native Japanese speech to synthesize prosody-corrected synthetic speech. Results of an experimental evaluation demonstrate that duration and F₀ correction are significantly effective for improving naturalness.},
keywords={},
doi={10.1587/transinf.2018EDL8264},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Prosody Correction Preserving Speaker Individuality for Chinese-Accented Japanese HMM-Based Text-to-Speech Synthesis
T2 - IEICE TRANSACTIONS on Information
SP - 1218
EP - 1221
AU - Daiki SEKIZAWA
AU - Shinnosuke TAKAMICHI
AU - Hiroshi SARUWATARI
PY - 2019
DO - 10.1587/transinf.2018EDL8264
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2019
AB - This article proposes a prosody correction method based on partial model adaptation for Chinese-accented Japanese hidden Markov model (HMM)-based text-to-speech synthesis. Although text-to-speech synthesis built from non-native speech accurately reproduces the speaker's individuality in synthetic speech, the naturalness of the synthetic speech is strongly degraded. In the proposed model, to improve the naturalness while preserving the speaker individuality of Chinese-accented Japanese text-to-speech synthesis, we partially utilize HMM parameters of native Japanese speech to synthesize prosody-corrected synthetic speech. Results of an experimental evaluation demonstrate that duration and F₀ correction are significantly effective for improving naturalness.
ER -

IEICE TRANSACTIONS on Information