The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] paralinguistic information(2hit)

1-2hit
  • Facial Expression Generation from Speaker's Emotional States in Daily Conversation

    Hiroki MORI  Koh OHSHIMA  

     
    PAPER-Media Communication

      Vol:
    E91-D No:6
      Page(s):
    1628-1633

    A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.

  • Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech

    Nick CAMPBELL  

     
    INVITED PAPER

      Vol:
    E88-D No:3
      Page(s):
    376-383

    This paper describes the special demands of conversational speech in the context of corpus-based speech synthesis. The author proposed the CHATR system of prosody-based unit-selection for concatenative waveform synthesis seven years ago, and now extends this work to incorporate the results of an analysis of five-years of recordings of spontaneous conversational speeech in a wide range of actual daily-life situations. The paper proposes that the expresion of affect (often translated as 'kansei' in Japanese) is the main factor differentiating laboratory speech from real-world conversational speech, and presents a framework for the specification of affect through differences in speaking style and voice quality. Having an enormous corpus of speech samples available for concatenation allows the selection of complete phrase-sized utterance segments, and changes the focus of unit selection from segmental or phonetic continuity to one of prosodic and discoursal appropriateness instead. Samples of the resulting large-corpus-based synthesis can be heard at http://feast.his.atr.jp/AESOP.