The search functionality is under construction.

IEICE TRANSACTIONS on Information

Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics

Yuji OSHIMA, Shinnosuke TAKAMICHI, Tomoki TODA, Graham NEUBIG, Sakriani SAKTI, Satoshi NAKAMURA

  • Full Text Views

    0

  • Cite this

Summary :

This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Cross-lingual speech synthesis based on voice conversion or Hidden Markov Model (HMM)-based speech synthesis is a technique to synthesize foreign language speech using a target speaker's natural speech uttered in his/her mother tongue. Although the technique holds promise to improve a wide variety of applications, it tends to cause degradation of target speaker's individuality in synthetic speech compared to intra-lingual speech synthesis. This paper proposes a new approach to speech synthesis that preserves speaker individuality by using non-native speech spoken by the target speaker. Although the use of non-native speech makes it possible to preserve the speaker individuality in the synthesized target speech, naturalness is significantly degraded as the synthesized speech waveform is directly affected by unnatural prosody and pronunciation often caused by differences in the linguistic systems of the source and target languages. To improve naturalness while preserving speaker individuality, we propose (1) a prosody correction method based on model adaptation, and (2) a phonetic correction method based on spectrum replacement for unvoiced consonants. The experimental results using English speech uttered by native Japanese speakers demonstrate that (1) the proposed methods are capable of significantly improving naturalness while preserving the speaker individuality in synthetic speech, and (2) the proposed methods also improve intelligibility as confirmed by a dictation test.

Publication
IEICE TRANSACTIONS on Information Vol.E99-D No.12 pp.3132-3139
Publication Date
2016/12/01
Publicized
2016/08/30
Online ISSN
1745-1361
DOI
10.1587/transinf.2016EDP7231
Type of Manuscript
PAPER
Category
Speech and Hearing

Authors

Yuji OSHIMA
  Nara Institute of Science and Technology
Shinnosuke TAKAMICHI
  The University of Tokyo
Tomoki TODA
  Nara Institute of Science and Technology,Nagoya University
Graham NEUBIG
  Nara Institute of Science and Technology
Sakriani SAKTI
  Nara Institute of Science and Technology
Satoshi NAKAMURA
  Nara Institute of Science and Technology

Keyword