The search functionality is under construction.

Author Search Result

[Author] Ryohei NAKATSU(3hit)

1-3hit
  • Use of Multimodal Information in Facial Emotion Recognition

    Liyanage C. DE SILVA  Tsutomu MIYASATO  Ryohei NAKATSU  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Vol:
    E81-D No:1
      Page(s):
    105-114

    Detection of facial emotions are mainly addressed by computer vision researchers based on facial display. Also detection of vocal expressions of emotions is found in research work done by acoustic researchers. Most of these research paradigms are devoted purely to visual or purely to auditory human emotion detection. However we found that it is very interesting to consider both of these auditory and visual informations together, for processing, since we hope this kind of multimodal information processing will become a datum of information processing in future multimedia era. By several intensive subjective evaluation studies we found that human beings recognize Anger, happiness, Surprise and Dislike by their visual appearance, compared to voice only detection. When the audio track of each emotion clip is dubbed with a different type of auditory emotional expression, still Anger, Happiness and Surprise were video dominant. However Dislike emotion gave mixed responses to different speakers. In both studies we found that Sadness and Fear emotions were audio dominant. As a conclusion to the paper we propose a method of facial emotion detection by using a hybrid approach, which uses multimodal informations for facial emotion recognition.

  • An Evaluation of Visual Fatigue in 3-D Displays: Focusing on the Mismatching of Convergence and Accommodation

    Toshiaki SUGIHARA  Tsutomu MIYASATO  Ryohei NAKATSU  

     
    PAPER

      Vol:
    E82-C No:10
      Page(s):
    1814-1822

    In this paper, we describe an experimental evaluation of visual fatigue in a binocular disparity type 3-D display system. To evaluate this fatigue, we use a subjective assessment method and focus on mismatching between convergence and accommodation, which is a major weakness of binocular disparity 3-D displays. For this subjective assessment, we use a newly-developed binocular disparity 3-D display system with a compensation function for accommodation. Because this equipment only allowed us to compare the terms of the mismatching itself, the evaluation is more accurate than similar previous works.

  • Automatic Evaluation of English Pronunciation Based on Speech Recognition Techniques

    Hiroshi HAMADA  Satoshi MIKI  Ryohei NAKATSU  

     
    PAPER-Speech Processing

      Vol:
    E76-D No:3
      Page(s):
    352-359

    A new method is proposed for automatically evaluating the English pronunciation quality of non-native speakers. It is assumed that pronunciation can be rated using three criteria: the static characteristics of phonetic spectra, the dynamic structure of spectrum sequences, and the prosodic characteristics of utterances. The evaluation uses speech recognition techniques to compare the English words pronounced by a non-native speaker with those pronounced by a native speaker. Three evaluation measures are proposed to rate pronunciation quality. (1) The standard deviation of the mapping vectors, which map the codebook vectors of the non-native speaker onto the vector space of the native speaker, is used to evaluate the static phonetic spectra characteristics. (2) The spectral distance between words pronounced by the non-native speaker and those pronounced by the native speaker obtained by the DTW method is used to evaluate the dynamic characteristics of spectral sequences. (3) The differences in fundamental frequency and speech power between the pronunciation of the native and non-native speaker are used as the criteria for evaluating prosodic characteristics. Evaluation experiments are carried out using 441 words spoken by 10 Japanese speakers and 10 native speakers. One half of the 441 words was used to evaluate static phonetic spectra characteristics, and the other half was used to evaluate the dynamic characteristics of spectral sequences, as well as the prosodic characteristics. Based on the experimental results, the correlation between the evaluation scores and the scores determined by human judgement is found to be 0.90.