The search functionality is under construction.

Author Search Result

[Author] Hiroshi FUJIMURA(2hit)

1-2hit
  • Construction and Evaluation of a Large In-Car Speech Corpus

    Kazuya TAKEDA  Hiroshi FUJIMURA  Katsunobu ITOU  Nobuo KAWAGUCHI  Shigeki MATSUBARA  Fumitada ITAKURA  

     
    PAPER-Speech Corpora and Related Topics

      Vol:
    E88-D No:3
      Page(s):
    553-561

    In this paper, we discuss the construction of a large in-car spoken dialogue corpus and the result of its analysis. We have developed a system specially built into a Data Collection Vehicle (DCV) which supports the synchronous recording of multichannel audio data from 16 microphones that can be placed in flexible positions, multichannel video data from 3 cameras, and vehicle related data. Multimedia data has been collected for three sessions of spoken dialogue with different modes of navigation, during approximately a 60 minute drive by each of 800 subjects. We have characterized the collected dialogues across the three sessions. Some characteristics such as sentence complexity and SNR are found to differ significantly among the sessions. Linear regression analysis results also clarify the relative importance of various corpus characteristics.

  • Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

    Weifeng LI  Tetsuya SHINDE  Hiroshi FUJIMURA  Chiyomi MIYAJIMA  Takanori NISHINO  Katunobu ITOU  Kazuya TAKEDA  Fumitada ITAKURA  

     
    PAPER-Feature Extraction and Acoustic Medelings

      Vol:
    E88-D No:3
      Page(s):
    384-390

    This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.