The search functionality is under construction.

Author Search Result

[Author] Takanobu NISHIURA(5hit)

1-5hit
  • AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition

    Satoshi NAKAMURA  Kazuya TAKEDA  Kazumasa YAMAMOTO  Takeshi YAMADA  Shingo KUROIWA  Norihide KITAOKA  Takanobu NISHIURA  Akira SASOU  Mitsunori MIZUMACHI  Chiyomi MIYAJIMA  Masakiyo FUJIMOTO  Toshiki ENDO  

     
    PAPER-Speech Corpora and Related Topics

      Vol:
    E88-D No:3
      Page(s):
    535-544

    This paper introduces an evaluation framework for Japanese noisy speech recognition named AURORA-2J. Speech recognition systems must still be improved to be robust to noisy environments, but this improvement requires development of the standard evaluation corpus and assessment technologies. Recently, the Aurora 2, 3 and 4 corpora and their evaluation scenarios have had significant impact on noisy speech recognition research. The AURORA-2J is a Japanese connected digits corpus and its evaluation scripts are designed in the same way as Aurora 2 with the help of European Telecommunications Standards Institute (ETSI) AURORA group. This paper describes the data collection, baseline scripts, and its baseline performance. We also propose a new performance analysis method that considers differences in recognition performance among speakers. This method is based on the word accuracy per speaker, revealing the degree of the individual difference of the recognition performance. We also propose categorization of modifications, applied to the original HTK baseline system, which helps in comparing the systems and in recognizing technologies that improve the performance best within the same category.

  • Multiple Sound Source Localization Based on Inter-Channel Correlation Using a Distributed Microphone System in a Real Environment

    Kook CHO  Hajime OKUMURA  Takanobu NISHIURA  Yoichi YAMASHITA  

     
    PAPER-Microphone Array

      Vol:
    E93-D No:9
      Page(s):
    2463-2471

    In real environments, the presence of ambient noise and room reverberations seriously degrades the accuracy in sound source localization. In addition, conventional sound source localization methods cannot localize multiple sound sources accurately in real noisy environments. This paper proposes a new method of multiple sound source localization using a distributed microphone system that is a recording system with multiple microphones dispersed to a wide area. The proposed method localizes a sound source by finding the position that maximizes the accumulated correlation coefficient between multiple channel pairs. After the estimation of the first sound source, a typical pattern of the accumulated correlation for a single sound source is subtracted from the observed distribution of the accumulated correlation. Subsequently, the second sound source is searched again. To evaluate the effectiveness of the proposed method, experiments of two sound source localization were carried out in an office room. The result shows that sound source localization accuracy is about 99.7%. The proposed method could realize the multiple sound source localization robustly and stably.

  • Speech Enhancement for Laser Doppler Vibrometer Dealing with Unknown Irradiated Objects

    Chengkai CAI  Kenta IWAI  Takanobu NISHIURA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2022/09/30
      Vol:
    E106-A No:4
      Page(s):
    647-656

    The acquisition of distant sound has always been a hot research topic. Since sound is caused by vibration, one of the best methods for measuring distant sound is to use a laser Doppler vibrometer (LDV). This laser has high directivity, that enables it to acquire sound from far away, which is of great practical use for disaster relief and other situations. However, due to the vibration characteristics of the irradiated object itself and the reflectivity of its surface (or other reasons), the acquired sound is often lacking frequency components in certain frequency bands and is mixed with obvious noise. Therefore, when using LDV to acquire distant speech, if we want to recognize the actual content of the speech, it is necessary to enhance the acquired speech signal in some way. Conventional speech enhancement methods are not generally applicable due to the various types of degradation in observed speech. Moreover, while several speech enhancement methods for LDV have been proposed, they are only effective when the irradiated object is known. In this paper, we present a speech enhancement method for LDV that can deal with unknown irradiated objects. The proposed method is composed of noise reduction, pitch detection, power spectrum envelope estimation, power spectrum reconstruction, and phase estimation. Experimental results demonstrate the effectiveness of our method for enhancing the acquired speech with unknown irradiated objects.

  • Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability Criteria

    Yuki DENDA  Takanobu NISHIURA  Yoichi YAMASHITA  

     
    PAPER-Applications

      Vol:
    E91-D No:3
      Page(s):
    598-606

    This paper proposes a robust omnidirectional audio-visual (AV) talker localizer for AV applications. The proposed localizer consists of two innovations. One of them is robust omnidirectional audio and visual features. The direction of arrival (DOA) estimation using an equilateral triangular microphone array, and human position estimation using an omnidirectional video camera extract the AV features. The other is a dynamic fusion of the AV features. The validity criterion, called the audio- or visual-localization counter, validates each audio- or visual-feature. The reliability criterion, called the speech arriving evaluator, acts as a dynamic weight to eliminate any prior statistical properties from its fusion procedure. The proposed localizer can compatibly achieve talker localization in a speech activity and user localization in a non-speech activity under the identical fusion rule. Talker localization experiments were conducted in an actual room to evaluate the effectiveness of the proposed localizer. The results confirmed that the talker localization performance of the proposed AV localizer using the validity and reliability criteria is superior to that of conventional localizers.

  • Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

    Yuki DENDA  Takanobu NISHIURA  Yoichi YAMASHITA  

     
    PAPER-Speech Enhancement

      Vol:
    E89-D No:3
      Page(s):
    1050-1057

    This paper describes a new talker direction estimation method for front-end processing to capture distant-talking speech by using a microphone array. The proposed method consists of two algorithms: One is a TDOA (Time Delay Of Arrival) estimation algorithm based on a weighted CSP (Cross-power Spectrum Phase) analysis with an average speech spectrum and CSP coefficient subtraction. The other is a talker direction estimation algorithm based on ML (Maximum Likelihood) estimation in a time sequence of the estimated TDOAs. To evaluate the effectiveness of the proposed method, talker direction estimation experiments were carried out in an actual office room. The results confirmed that the talker direction estimation performance of the proposed method is superior to that of the conventional methods in both diffused- and directional-noise environments.