1-3hit |
Yusuke MIZUNO Kazunobu KONDO Takanori NISHINO Norihide KITAOKA Kazuya TAKEDA
Blind source separation is a technique that can separate sound sources without such information as source location, the number of sources, and the utterance content. Multi-channel source separation using many microphones separates signals with high accuracy, even if there are many sources. However, these methods have extremely high computational complexity, which must be reduced. In this paper, we propose a computational complexity reduction method for blind source separation based on frequency domain independent component analysis (FDICA) and examine temporal data that are effective for source separation. A frame with many sound sources is effective for FDICA source separation. We assume that a frame with a low kurtosis has many sound sources and preferentially select such frames. In our proposed method, we used the log power spectrum and the kurtosis of the magnitude distribution of the observed data as selection criteria and conducted source separation experiments using speech signals from twelve speakers. We evaluated the separation performances by the signal-to-interference ratio (SIR) improvement score. From our results, the SIR improvement score was 24.3dB when all the frames were used, and 23.3dB when the 300 frames selected by our criteria were used. These results clarified that our proposed selection criteria based on kurtosis and magnitude is effective. Furthermore, we significantly reduced the computational complexity because it is proportional to the number of selected frames.
Motoki OGASAWARA Takanori NISHINO Kazuya TAKEDA
The separation and localization of sound source signals are important techniques for many applications, such as highly realistic communication and speech recognition systems. These systems are expected to work without such prior information as the number of sound sources and the environmental conditions. In this paper, we developed a dodecahedral microphone array and proposed a novel separation method with our developed device. This method refers to human sound localization cues and uses acoustical characteristics obtained by the shape of the dodecahedral microphone array. Moreover, this method includes an estimation method of the number of sound sources that can operate without prior information. The sound source separation performances were evaluated under simulated and actual reverberant conditions, and the results were compared with the conventional method. The experimental results showed that our separation performance outperformed the conventional method.
Rajkishore PRASAD Hiroshi SARUWATARI Kiyohiro SHIKANO
This paper presents a study on the blind separation of a convoluted mixture of speech signals using Frequency Domain Independent Component Analysis (FDICA) algorithm based on the negentropy maximization of Time Frequency Series of Speech (TFSS). The comparative studies on the negentropy approximation of TFSS using generalized Higher Order Statistics (HOS) of different nonquadratic, nonlinear functions are presented. A new nonlinear function based on the statistical modeling of TFSS by exponential power functions has also been proposed. The estimation of standard error and bias, obtained using the sequential delete-one jackknifing method, in the approximation of negentropy of TFSS by different nonlinear functions along with their signal separation performance indicate the superlative power of the exponential-power-based nonlinear function. The proposed nonlinear function has been found to speed-up convergence with slight improvement in the separation quality under reverberant conditions.