The search functionality is under construction.
The search functionality is under construction.

Phoneme Power Control for Speech Synthesis

Kenzo ITOH, Tomohisa HIROKAWA, Hirokazu SATO

  • Full Text Views

    0

  • Cite this

Summary :

This paper proposes a new method of phoneme power control for speech synthesis by rule. The innovation of this method lies in its use of the phoneme environment and the relationship between speech power and pitch frequency. First, the permissible threshold (PT) for power modification is measured by subjective experiments using power manipulated speech material. As a result, it is concluded that the PT of power modification is 4.1 dB. This experimental result is significant when discussing power control and gives a criterion for power control accuracy. Next, the relationship between speech power and pitch frequency is analyzed using a very large speech data base. The results show that the relationship between phoneme power and pitch frequency is affected by the kind of phoneme, the adjoining phonemes, rising or falling pitch, and initial or final position in the sentence. Finally, we propose that the phoneme power should be controlled by pitch frequency and phoneme environment. This proposal is implemented in a waveform concatenation type text-to-speech synthesizer. This new method yields an averaged root mean square error between real and estimated speech power of 2.17 dB. This value indicates that 94% of the estimated power values are within the permissible threshold of human perception.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E76-A No.11 pp.1911-1918
Publication Date
1993/11/25
Publicized
Online ISSN
DOI
Type of Manuscript
Special Section PAPER (Special Section on Speech Synthesis: Current Technologies and Thier Application)
Category

Authors

Keyword