IEICE global.ieice.org Site

Author Search Result

[Author] Shoko ARAKI(6hit)

1-6hit

Blind Source Separation for Moving Speech Signals Using Blockwise ICA and Residual Crosstalk Subtraction
Ryo MUKAI Hiroshi SAWADA Shoko ARAKI Shoji MAKINO

PAPER-Speech/Acoustic Signal Processing

Vol:
E87-A No:8
Page(s):
1941-1948
This paper describes a real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.
Integration of Spatial Cue-Based Noise Reduction and Speech Model-Based Source Restoration for Real Time Speech Enhancement
Tomoko KAWASE Kenta NIWA Masakiyo FUJIMOTO Kazunori KOBAYASHI Shoko ARAKI Tomohiro NAKATANI

PAPER-Digital Signal Processing

Vol:
E100-A No:5
Page(s):
1127-1136
We propose a microphone array speech enhancement method that integrates spatial-cue-based source power spectral density (PSD) estimation and statistical speech model-based PSD estimation. The goal of this research was to clearly pick up target speech even in noisy environments such as crowded places, factories, and cars running at high speed. Beamforming with post-Wiener filtering is commonly used in many conventional studies on microphone-array noise reduction. For calculating a Wiener filter, speech/noise PSDs are essential, and they are estimated using spatial cues obtained from microphone observations. Assuming that the sound sources are sparse in the temporal-spatial domain, speech/noise PSDs may be estimated accurately. However, PSD estimation errors increase under circumstances beyond this assumption. In this study, we integrated speech models and PSD-estimation-in-beamspace method to correct speech/noise PSD estimation errors. The roughly estimated noise PSD was obtained frame-by-frame by analyzing spatial cues from array observations. By combining noise PSD with the statistical model of clean-speech, the relationships between the PSD of the observed signal and that of the target speech, hereafter called the observation model, could be described without pre-training. By exploiting Bayes' theorem, a Wiener filter is statistically generated from observation models. Experiments conducted to evaluate the proposed method showed that the signal-to-noise ratio and naturalness of the output speech signal were significantly better than that with conventional methods.
Blind Source Separation of Convolutive Mixtures of Speech in Frequency Domain
Shoji MAKINO Hiroshi SAWADA Ryo MUKAI Shoko ARAKI

INVITED PAPER

Vol:
E88-A No:7
Page(s):
1640-1655
This paper overviews a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech. Frequency-domain BSS performs independent component analysis (ICA) in each frequency bin, and this is more efficient than time-domain BSS. We describe a sophisticated total solution for frequency-domain BSS, including permutation, scaling, circularity, and complex activation function solutions. Experimental results of 22, 33, 44, 68, and 22 (moving sources), (#sources#microphones) in a room are promising.
Subband-Based Blind Separation for Convolutive Mixtures of Speech
Shoko ARAKI Shoji MAKINO Robert AICHNER Tsuyoki NISHIKAWA Hiroshi SARUWATARI

PAPER-Engineering Acoustics

Vol:
E88-A No:12
Page(s):
3593-3603
We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.
Polar Coordinate Based Nonlinear Function for Frequency-Domain Blind Source Separation
Hiroshi SAWADA Ryo MUKAI Shoko ARAKI Shoji MAKINO

PAPER-Convolutive Systems

Vol:
E86-A No:3
Page(s):
590-596
This paper discusses a nonlinear function for independent component analysis to process complex-valued signals in frequency-domain blind source separation. Conventionally, nonlinear functions based on the Cartesian coordinates are widely used. However, such functions have a convergence problem. In this paper, we propose a more appropriate nonlinear function that is based on the polar coordinates of a complex number. In addition, we show that the difference between the two types of functions arises from the assumed densities of independent components. Our discussion is supported by several experimental results for separating speech signals, which show that the polar type nonlinear functions behave better than the Cartesian type.
Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation
Audrey BLIN Shoko ARAKI Shoji MAKINO

PAPER-Blind Source Separation

Vol:
E88-A No:7
Page(s):
1693-1700
This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.

Author Search Result

[Author] Shoko ARAKI(6hit)

Blind Source Separation for Moving Speech Signals Using Blockwise ICA and Residual Crosstalk Subtraction

Integration of Spatial Cue-Based Noise Reduction and Speech Model-Based Source Restoration for Real Time Speech Enhancement

Blind Source Separation of Convolutive Mixtures of Speech in Frequency Domain

Subband-Based Blind Separation for Convolutive Mixtures of Speech

Polar Coordinate Based Nonlinear Function for Frequency-Domain Blind Source Separation

Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles