The search functionality is under construction.
The search functionality is under construction.

Physiologically-Based Speech Synthesis Using Neural Networks

Makoto HIRAYAMA, Eric Vatikiotis-BATESON, Mitsuo KAWATO

  • Full Text Views

    0

  • Cite this

Summary :

This paper focuses on two areas in our effort to synthesize speech from neuromotor input using neural network models that effect transforms between cognitive intentions to speak, their physiological effects on vocal tract structures, and subsequent realization as acoustic signals. The first area concerns the biomechanical transform between motor commands to muscles and the ensuing articulator behavior. Using physiological data of muscle EMG (electromyography) and articulator movements during natural English speech utterances, three articulator-specific neural networks learn the forward dynamics that relate motor commands to the muscles and motion of the tongue, jaw, ant lips. Compared to a fully-connected network, mapping muscle EMG and motion for all three sets of articulators at once, this modular approach has improved performance by reducing network complexity and has eliminated some of the confounding influence of functional coupling among articulators. Network independence has also allowed us to identify and assess the effects of technical and empirical limitations on an articulator-by-articulator basis. This is particularly important for modeling the tongue whose complex structure is very difficult to examine empirically. The second area of progress concerns the transform between articulator motion and the speech acoustics. From the articulatory movement trajectories, a second neural network generates PARCOR (partial correlation) coefficients which are then used to synthesize the speech acoustics. In the current implementation, articulator velocities have been added as the inputs to the network. As a result, the model now follows the fast changes of the coefficients for consonants generated by relatively slow articulatory movements during natural English utterances. Although much work still needs to be done, progress in these areas brings us closer to our goal of emulating speech production processes computationally.

Publication
IEICE TRANSACTIONS on Fundamentals Vol.E76-A No.11 pp.1898-1910
Publication Date
1993/11/25
Publicized
Online ISSN
DOI
Type of Manuscript
Special Section PAPER (Special Section on Speech Synthesis: Current Technologies and Thier Application)
Category

Authors

Keyword