In speech output expected as an ideal man-machine interface, there exists an important issue on emotion production in order to not only improve its naturalness but also achieve more sophisticated speech interaction between man and machine. Speech has two aspects, which are prosodic information and phonetic feature. For the purpose of application to natural and high quality speech synthesis, the role of prosody in speech perception has been studied. In this paper, prosodic components, which contribute to the expression of emotions and their intensity, are clarified by analyzing emotional speech and by conducting listening tests of synthetic speech. The analysis is performed by substituting the components of neutral speech (i.e., one with no particular emotion) with those of emotional speech preserving the temporal correspondence by means of DTW. It has been confirmed that prosodic components, which are composed of pitch structure, temporal structure and amplitude structure, contribute to the expression of emotions more than the spectral structure of speech. The results of listening tests using prosodic substituted speech show that temporal structure is the most important for the expression of anger, while all of three components are much more important for the intensity of anger. Pitch structure also plays a significant role in the expression of joy and sadness and their intensity. These results make it possible to convert neutral utterances into utterances expressing various emotions. The results can also be applied to controlling the emotional characteristics of speech in synthesis by rule.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yoshinori KITAHARA, Yoh'ichi TOHKURA, "Prosodic Control to Express Emotions for Man-Machine Speech Interaction" in IEICE TRANSACTIONS on Fundamentals,
vol. E75-A, no. 2, pp. 155-163, February 1992, doi: .
Abstract: In speech output expected as an ideal man-machine interface, there exists an important issue on emotion production in order to not only improve its naturalness but also achieve more sophisticated speech interaction between man and machine. Speech has two aspects, which are prosodic information and phonetic feature. For the purpose of application to natural and high quality speech synthesis, the role of prosody in speech perception has been studied. In this paper, prosodic components, which contribute to the expression of emotions and their intensity, are clarified by analyzing emotional speech and by conducting listening tests of synthetic speech. The analysis is performed by substituting the components of neutral speech (i.e., one with no particular emotion) with those of emotional speech preserving the temporal correspondence by means of DTW. It has been confirmed that prosodic components, which are composed of pitch structure, temporal structure and amplitude structure, contribute to the expression of emotions more than the spectral structure of speech. The results of listening tests using prosodic substituted speech show that temporal structure is the most important for the expression of anger, while all of three components are much more important for the intensity of anger. Pitch structure also plays a significant role in the expression of joy and sadness and their intensity. These results make it possible to convert neutral utterances into utterances expressing various emotions. The results can also be applied to controlling the emotional characteristics of speech in synthesis by rule.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e75-a_2_155/_p
Copy
@ARTICLE{e75-a_2_155,
author={Yoshinori KITAHARA, Yoh'ichi TOHKURA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Prosodic Control to Express Emotions for Man-Machine Speech Interaction},
year={1992},
volume={E75-A},
number={2},
pages={155-163},
abstract={In speech output expected as an ideal man-machine interface, there exists an important issue on emotion production in order to not only improve its naturalness but also achieve more sophisticated speech interaction between man and machine. Speech has two aspects, which are prosodic information and phonetic feature. For the purpose of application to natural and high quality speech synthesis, the role of prosody in speech perception has been studied. In this paper, prosodic components, which contribute to the expression of emotions and their intensity, are clarified by analyzing emotional speech and by conducting listening tests of synthetic speech. The analysis is performed by substituting the components of neutral speech (i.e., one with no particular emotion) with those of emotional speech preserving the temporal correspondence by means of DTW. It has been confirmed that prosodic components, which are composed of pitch structure, temporal structure and amplitude structure, contribute to the expression of emotions more than the spectral structure of speech. The results of listening tests using prosodic substituted speech show that temporal structure is the most important for the expression of anger, while all of three components are much more important for the intensity of anger. Pitch structure also plays a significant role in the expression of joy and sadness and their intensity. These results make it possible to convert neutral utterances into utterances expressing various emotions. The results can also be applied to controlling the emotional characteristics of speech in synthesis by rule.},
keywords={},
doi={},
ISSN={},
month={February},}
Copy
TY - JOUR
TI - Prosodic Control to Express Emotions for Man-Machine Speech Interaction
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 155
EP - 163
AU - Yoshinori KITAHARA
AU - Yoh'ichi TOHKURA
PY - 1992
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E75-A
IS - 2
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - February 1992
AB - In speech output expected as an ideal man-machine interface, there exists an important issue on emotion production in order to not only improve its naturalness but also achieve more sophisticated speech interaction between man and machine. Speech has two aspects, which are prosodic information and phonetic feature. For the purpose of application to natural and high quality speech synthesis, the role of prosody in speech perception has been studied. In this paper, prosodic components, which contribute to the expression of emotions and their intensity, are clarified by analyzing emotional speech and by conducting listening tests of synthetic speech. The analysis is performed by substituting the components of neutral speech (i.e., one with no particular emotion) with those of emotional speech preserving the temporal correspondence by means of DTW. It has been confirmed that prosodic components, which are composed of pitch structure, temporal structure and amplitude structure, contribute to the expression of emotions more than the spectral structure of speech. The results of listening tests using prosodic substituted speech show that temporal structure is the most important for the expression of anger, while all of three components are much more important for the intensity of anger. Pitch structure also plays a significant role in the expression of joy and sadness and their intensity. These results make it possible to convert neutral utterances into utterances expressing various emotions. The results can also be applied to controlling the emotional characteristics of speech in synthesis by rule.
ER -