1-3hit |
Masashi OKADA Nobuyuki IWANAGA Tomoya MATSUMURA Takao ONOYE Wataru KOBAYASHI
In this paper, we propose a new 3D sound rendering method for multiple sound sources with limited computational resources. The method is based on fuzzy clustering, which achieves dual benefits of two general methods based on amplitude-panning and hard clustering. In embedded systems where the number of reproducible sound sources is restricted, the general methods suffer from localization errors and/or serious quality degradation, whereas the proposed method settles the problems by executing clustering-process and amplitude-panning simultaneously. Computational cost evaluation based on DSP implementation and subjective listening test have been performed to demonstrate the applicability for embedded systems and the effectiveness of the proposed method.
Kook CHO Hajime OKUMURA Takanobu NISHIURA Yoichi YAMASHITA
In real environments, the presence of ambient noise and room reverberations seriously degrades the accuracy in sound source localization. In addition, conventional sound source localization methods cannot localize multiple sound sources accurately in real noisy environments. This paper proposes a new method of multiple sound source localization using a distributed microphone system that is a recording system with multiple microphones dispersed to a wide area. The proposed method localizes a sound source by finding the position that maximizes the accumulated correlation coefficient between multiple channel pairs. After the estimation of the first sound source, a typical pattern of the accumulated correlation for a single sound source is subtracted from the observed distribution of the accumulated correlation. Subsequently, the second sound source is searched again. To evaluate the effectiveness of the proposed method, experiments of two sound source localization were carried out in an office room. The result shows that sound source localization accuracy is about 99.7%. The proposed method could realize the multiple sound source localization robustly and stably.
Panikos HERACLEOUS Satoshi NAKAMURA Takeshi YAMADA Kiyohiro SHIKANO
This paper describes a method for hands-free speech recognition, and particularly for the simultaneous recognition of multiple sound sources. The method is based on the 3-D Viterbi search, i.e., extended to the 3-D N-best search method enabling the recognition of multiple sound sources. The baseline system integrates two existing technologies--3-D Viterbi search and conventional N-best search--into a complete system. Previously, the first evaluation of the 3-D N-best search-based system showed that new ideas are necessary to develop a system for the simultaneous recognition of multiple sound sources. It found two factors that play important roles in the performance of the system, namely the different likelihood ranges of the sound sources and the direction-based separation of the hypotheses. In order to solve these problems, we implemented a likelihood normalization and a path distance-based clustering technique into the baseline 3-D N-best search-based system. The performance of our system was evaluated through experiments on simulated data for the case of two talkers. The experiments showed significant improvements by implementing the above two techniques. The best results were obtained by implementing the two techniques and using a microphone array composed of 32 channels. More specifically, the Word Accuracy for the two talkers was higher than 80% and the Simultaneous Word Accuracy (where both sources are correctly recognized simultaneously) was higher than 70%, which are very promising results.