The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] convolutive(11hit)

  • Automatic Model Order Selection for Convolutive Non-Negative Matrix Factorization

    Yinan LI  Xiongwei ZHANG  Meng SUN  Chong JIA  Xia ZOU  

    LETTER-Speech and Hearing

    E99-A No:10

    Exploring a parsimonious model that is just enough to represent the temporal dependency of time serial signals such as audio or speech is a practical requirement for many signal processing applications. A well suited method for intuitively and efficiently representing magnitude spectra is to use convolutive non-negative matrix factorization (CNMF) to discover the temporal relationship among nearby frames. However, the model order selection problem in CNMF, i.e., the choice of the number of convolutive bases, has seldom been investigated ever. In this paper, we propose a novel Bayesian framework that can automatically learn the optimal model order through maximum a posteriori (MAP) estimation. The proposed method yields a parsimonious and low-rank approximation by removing the redundant bases iteratively. We conducted intuitive experiments to show that the proposed algorithm is very effective in automatically determining the correct model order.

  • Online Convolutive Non-Negative Bases Learning for Speech Enhancement

    Yinan LI  Xiongwei ZHANG  Meng SUN  Yonggang HU  Li LI  

    LETTER-Speech and Hearing

    E99-A No:8

    An online version of convolutive non-negative sparse coding (CNSC) with the generalized Kullback-Leibler (K-L) divergence is proposed to adaptively learn spectral-temporal bases from speech streams. The proposed scheme processes training data piece-by-piece and incrementally updates learned bases with accumulated statistics to overcome the inefficiency of its offline counterpart in processing large scale or streaming data. Compared to conventional non-negative sparse coding, we utilize the convolutive model within bases, so that each basis is capable of describing a relatively long temporal span of signals, which helps to improve the representation power of the model. Moreover, by incorporating a voice activity detector (VAD), we propose an unsupervised enhancement algorithm that updates the noise dictionary adaptively from non-speech intervals. Meanwhile, for the speech intervals, one can adaptively learn the speech bases by keeping the noise ones fixed. Experimental results show that the proposed algorithm outperforms the competing algorithms substantially, especially when the background noise is highly non-stationary.

  • Derivation of Update Rules for Convolutive NMF Based on Squared Euclidean Distance, KL Divergence, and IS Divergence

    Hiroki TANJI  Ryo TANAKA  Kyohei TABATA  Yoshito ISEKI  Takahiro MURAKAMI  Yoshihisa ISHIDA  


    E97-A No:11

    In this paper, we present update rules for convolutive nonnegative matrix factorization (NMF) in which cost functions are based on the squared Euclidean distance, the Kullback-Leibler (KL) divergence and the Itakura-Saito (IS) divergence. We define an auxiliary function for each cost function and derive the update rule. We also apply this method to the single-channel signal separation in speech signals. Experimental results showed that the convergence of our KL divergence-based method was better than that in the conventional method, and our method achieved single-channel signal separation successfully.

  • Time-Domain Blind Signal Separation of Convolutive Mixtures via Multidimensional Independent Component Analysis

    Takahiro MURAKAMI  Toshihisa TANAKA  Yoshihisa ISHIDA  


    E92-A No:3

    An algorithm for blind signal separation (BSS) of convolutive mixtures is presented. In this algorithm, the BSS problem is treated as multidimensional independent component analysis (ICA) by introducing an extended signal vector which is composed of current and previous samples of signals. It is empirically known that a number of conventional ICA algorithms solve the multidimensional ICA problem up to permutation and scaling of signals. In this paper, we give theoretical justification for using any conventional ICA algorithm. Then, we discuss the remaining problems, i.e., permutation and scaling of signals. To solve the permutation problem, we propose a simple algorithm which classifies the signals obtained by a conventional ICA algorithm into mutually independent subsets by utilizing temporal structure of the signals. For the scaling problem, we prove that the method proposed by Koldovský and Tichavský is theoretically proper in respect of estimating filtered versions of source signals which are observed at sensors.

  • A Distortion-Free Learning Algorithm for Feedforward Multi-Channel Blind Source Separation

    Akihide HORITA  Kenji NAKAYAMA  Akihiro HIRANO  

    PAPER-Digital Signal Processing

    E90-A No:12

    FeedForward (FF-) Blind Source Separation (BSS) systems have some degree of freedom in the solution space. Therefore, signal distortion is likely to occur. First, a criterion for the signal distortion is discussed. Properties of conventional methods proposed to suppress the signal distortion are analyzed. Next, a general condition for complete separation and distortion-free is derived for multi-channel FF-BSS systems. This condition is incorporated in learning algorithms as a distortion-free constraint. Computer simulations using speech signals and stationary colored signals are performed for the conventional methods and for the new learning algorithms employing the proposed distortion-free constraint. The proposed method can well suppress signal distortion, while maintaining a high source separation performance.

  • Subband-Based Blind Separation for Convolutive Mixtures of Speech

    Shoko ARAKI  Shoji MAKINO  Robert AICHNER  Tsuyoki NISHIKAWA  Hiroshi SARUWATARI  

    PAPER-Engineering Acoustics

    E88-A No:12

    We propose utilizing subband-based blind source separation (BSS) for convolutive mixtures of speech. This is motivated by the drawback of frequency-domain BSS, i.e., when a long frame with a fixed long frame-shift is used to cover reverberation, the number of samples in each frequency decreases and the separation performance is degraded. In subband BSS, (1) by using a moderate number of subbands, a sufficient number of samples can be held in each subband, and (2) by using FIR filters in each subband, we can manage long reverberation. We confirm that subband BSS achieves better performance than frequency-domain BSS. Moreover, subband BSS allows us to select a separation method suited to each subband. Using this advantage, we propose efficient separation procedures that consider the frequency characteristics of room reverberation and speech signals (3) by using longer unmixing filters in low frequency bands and (4) by adopting an overlap-blockshift in BSS's batch adaptation in low frequency bands. Consequently, frequency-dependent subband processing is successfully realized with the proposed subband BSS.

  • Underdetermined Blind Separation of Convolutive Mixtures of Speech Using Time-Frequency Mask and Mixing Matrix Estimation

    Audrey BLIN  Shoko ARAKI  Shoji MAKINO  

    PAPER-Blind Source Separation

    E88-A No:7

    This paper focuses on the underdetermined blind source separation (BSS) of three speech signals mixed in a real environment from measurements provided by two sensors. To date, solutions to the underdetermined BSS problem have mainly been based on the assumption that the speech signals are sufficiently sparse. They involve designing binary masks that extract signals at time-frequency points where only one signal was assumed to exist. The major issue encountered in previous work relates to the occurrence of distortion, which affects a separated signal with loud musical noise. To overcome this problem, we propose combining sparseness with the use of an estimated mixing matrix. First, we use a geometrical approach to detect when only one source is active and to perform a preliminary separation with a time-frequency mask. This information is then used to estimate the mixing matrix, which allows us to improve our separation. Experimental results show that this combination of time-frequency mask and mixing matrix estimation provides separated signals of better quality (less distortion, less musical noise) than those extracted without using the estimated mixing matrix in reverberant conditions where the reverberant time (TR) was 130 ms and 200 ms. Furthermore, informal listening tests clearly show that musical noise is deeply lowered by the proposed method comparatively to the classical approaches.

  • Blind Source Separation of Convolutive Mixtures of Speech in Frequency Domain

    Shoji MAKINO  Hiroshi SAWADA  Ryo MUKAI  Shoko ARAKI  


    E88-A No:7

    This paper overviews a total solution for frequency-domain blind source separation (BSS) of convolutive mixtures of audio signals, especially speech. Frequency-domain BSS performs independent component analysis (ICA) in each frequency bin, and this is more efficient than time-domain BSS. We describe a sophisticated total solution for frequency-domain BSS, including permutation, scaling, circularity, and complex activation function solutions. Experimental results of 22, 33, 44, 68, and 22 (moving sources), (#sources#microphones) in a room are promising.

  • Blind Source Separation for Moving Speech Signals Using Blockwise ICA and Residual Crosstalk Subtraction

    Ryo MUKAI  Hiroshi SAWADA  Shoko ARAKI  Shoji MAKINO  

    PAPER-Speech/Acoustic Signal Processing

    E87-A No:8

    This paper describes a real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.

  • Overdetermined Blind Separation for Real Convolutive Mixtures of Speech Based on Multistage ICA Using Subarray Processing

    Tsuyoki NISHIKAWA  Hiroshi ABE  Hiroshi SARUWATARI  Kiyohiro SHIKANO  Atsunobu KAMINUMA  

    PAPER-Speech/Acoustic Signal Processing

    E87-A No:8

    We propose a new algorithm for overdetermined blind source separation (BSS) based on multistage independent component analysis (MSICA). To improve the separation performance, we have proposed MSICA in which frequency-domain ICA and time-domain ICA are cascaded. In the original MSICA, the specific mixing model, where the number of microphones is equal to that of sources, was assumed. However, additional microphones are required to achieve an improved separation performance under reverberant environments. This leads to alternative problems, e.g., a complication of the permutation problem. In order to solve them, we propose a new extended MSICA using subarray processing, where the number of microphones and that of sources are set to be the same in every subarray. The experimental results obtained under the real environment reveal that the separation performance of the proposed MSICA is improved as the number of microphones is increased.

  • Blind Source Separation of a Mixture of Communication Sources with Various Symbol Periods

    Sebastien HOUCKE  Antoine CHEVREUIL  Philippe LOUBATON  

    INVITED PAPER-Convolutive Systems

    E86-A No:3

    A blind source separation problem in a solicitations context is addressed. The mixture stems from several telecommunication signals, the symbol periods of which are unknown and possibly different. Cost functions are introduced, the optimization of which achieves the equalization for a user, i.e. estimation of the symbol period and the associated sequence of symbols. The method is iterated by implementing a deflation. The theoretical results are validated by simulations.