The search functionality is under construction.

Author Search Result

[Author] Kenji OZAWA(5hit)

1-5hit
  • Development of an Estimation Model for Instantaneous Presence in Audio-Visual Content

    Kenji OZAWA  Shota TSUKAHARA  Yuichiro KINOSHITA  Masanori MORISE  

     
    PAPER

      Pubricized:
    2015/10/21
      Vol:
    E99-D No:1
      Page(s):
    120-127

    The sense of presence is often used to evaluate the performances of audio-visual (AV) content and systems. However, a presence meter has yet to be realized. We consider that the sense of presence can be divided into two aspects: system presence and content presence. In this study we focused on content presence. To estimate the overall presence of a content item, we have developed estimation models for the sense of presence in audio-only and audio-visual content. In this study, the audio-visual model is expanded to estimate the instantaneous presence in an AV content item. Initially, we conducted an evaluation experiment of the presence with 40 content items to investigate the relationship between the features of the AV content and the instantaneous presence. Based on the experimental data, a neural-network-based model was developed by expanding the previous model. To express the variation in instantaneous presence, 6 audio-related features and 14 visual-related features, which are extracted from the content items in 500-ms intervals, are used as inputs for the model. The audio-related features are loudness, sharpness, roughness, dynamic range and standard deviation in sound pressure levels, and movement of sound images. The visual-related features involve hue, lightness, saturation, and movement of visual images. After constructing the model, a generalization test confirmed that the model is sufficiently accurate to estimate the instantaneous presence. Hence, the model should contribute to the development of a presence meter.

  • Muffled and Brisk Speech Evaluation with Criterion Based on Temporal Differentiation of Vocal Tract Area Function

    Masanori MORISE  Satoshi TSUZUKI  Hideki BANNO  Kenji OZAWA  

     
    LETTER-Speech and Hearing

      Pubricized:
    2014/09/17
      Vol:
    E97-D No:12
      Page(s):
    3230-3233

    This research deals with muffled speech as the evaluation target and introduces a criterion for evaluating the auditory impression in muffled speech. It focuses on the vocal tract area function (VTAF) to evaluate the auditory impression, and the criterion uses temporal differentiation of this function to track the temporal variation of the shape of the mouth. The experimental results indicate that the proposed criterion can be used to evaluate the auditory impression as well as the subjective impression.

  • WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications Open Access

    Masanori MORISE  Fumiya YOKOMORI  Kenji OZAWA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2016/04/05
      Vol:
    E99-D No:7
      Page(s):
    1877-1884

    A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.

  • Instantaneous Evaluation of the Sense of Presence in Audio-Visual Content

    Kenji OZAWA  Shota TSUKAHARA  Yuichiro KINOSHITA  Masanori MORISE  

     
    PAPER

      Vol:
    E98-D No:1
      Page(s):
    49-57

    The sense of presence is crucial to evaluate the performance of audio-visual (AV) equipment and content. Previously, the overall presence was evaluated for a set of AV content items by asking subjects to judge the presence of the entire content item. In this study, the sense of presence is evaluated for a time-series using the method of continuous judgment by category. Specifically, the audio signals of 40 content items with durations of approximately 30 s each were recorded with a dummy head, and then presented as stimuli to subjects via headphones. The corresponding visual signals were recorded using a video camera in the full-HD format, and reproduced on a 65-inch display. In the experiments, 20 subjects evaluated the instantaneous sense of presence of each item on a seven-point scale under two conditions: audio-only or audio-visual. At the end of the time-series, the subjects also evaluated the overall presence of the item by seven categories. Based on these results, the effects of visual information on the sense of presence were examined. The overall presence is highly correlated with the ten-percentile exceeded presence score, S10, which is the score that is exceeded for the 10% of the time during the responses. Based on the instantaneous presence data in this study, we are one step closer to our ultimate goal of developing a real-time operational presence meter.

  • RAMST-CNN: A Residual and Multiscale Spatio-Temporal Convolution Neural Network for Personal Identification with EEG

    Yuxuan ZHU  Yong PENG  Yang SONG  Kenji OZAWA  Wanzeng KONG  

     
    PAPER-Biometrics

      Pubricized:
    2020/08/06
      Vol:
    E104-A No:2
      Page(s):
    563-571

    In this study we propose a method to perform personal identification (PI) based on Electroencephalogram (EEG) signals, where the used network is named residual and multiscale spatio-temporal convolution neural network (RAMST-CNN). Combined with some popular techniques in deep learning, including residual learning (RL), multi-scale grouping convolution (MGC), global average pooling (GAP) and batch normalization (BN), RAMST-CNN has powerful spatio-temporal feature extraction ability as it achieves task-independence that avoids the complexity of selecting and extracting features manually. Experiments were carried out on multiple datasets, the results of which were compared with methods from other studies. The results show that the proposed method has a higher recognition accuracy even though the network it is based on is lightweight.