In this study, we aim to improve the performance of audio source separation for monaural mixture signals. For monaural audio source separation, semisupervised nonnegative matrix factorization (SNMF) can achieve higher separation performance by employing small supervised signals. In particular, penalized SNMF (PSNMF) with orthogonality penalty is an effective method. PSNMF forces two basis matrices for target and nontarget sources to be orthogonal to each other and improves the separation accuracy. However, the conventional orthogonality penalty is based on an inner product and does not affect the estimation of the basis matrix properly because of the scale indeterminacy between the basis and activation matrices in NMF. To cope with this problem, a new PSNMF with cosine similarity between the basis matrices is proposed. The experimental comparison shows the efficacy of the proposed cosine similarity penalty in supervised audio source separation.
Kazuhiro MURAKAMI Arata KAWAMURA Yoh-ichi FUJISAKA Nobuhiko HIRUMA Youji IIGUNI
In this paper, we propose a real-time BSS (Blind Source Separation) system with two microphones that extracts only desired sound sources. Under the assumption that the desired sound sources are close to the microphones, the proposed BSS system suppresses distant sound sources as undesired sound sources. We previously developed a BSS system that can estimate the distance from a microphone to a sound source and suppress distant sound sources, but it was not a real-time processing system. The proposed BSS system is a real-time version of our previous BSS system. To develop the proposed BSS system, we simplify some BSS procedures of the previous system. Simulation results showed that the proposed system can effectively suppress the distant source signals in real-time and has almost the same capability as the previous system.
Shinichi MOGAMI Yoshiki MITSUI Norihiro TAKAMUNE Daichi KITAMURA Hiroshi SARUWATARI Yu TAKAHASHI Kazunobu KONDO Hiroaki NAKAJIMA Hirokazu KAMEOKA
In this letter, we propose a new blind source separation method, independent low-rank matrix analysis based on generalized Kullback-Leibler divergence. This method assumes a time-frequency-varying complex Poisson distribution as the source generative model, which yields convex optimization in the spectrogram estimation. The experimental evaluation confirms the proposed method's efficacy.
Maoshen JIA Jundai SUN Feng DENG Junyue SUN
In this work, a multiple source separation method with joint sparse and non-sparse components recovery is proposed by using dual similarity determination. Specifically, a dual similarity coefficient is designed based on normalized cross-correlation and Jaccard coefficients, and its reasonability is validated via a statistical analysis on a quantitative effective measure. Thereafter, by regarding the sparse components as a guide, the non-sparse components are recovered using the dual similarity coefficient. Eventually, a separated signal is obtained by a synthesis of the sparse and non-sparse components. Experimental results demonstrate the separation quality of the proposed method outperforms some existing BSS methods including sparse components separation based methods, independent components analysis based methods and soft threshold based methods.
Chao SUN Ling YANG Juan DU Fenggang SUN Li CHEN Haipeng XI Shenglei DU
In this paper, we first propose two batch blind source separation and equalization algorithms based on support vector regression (SVR) for linear time-invariant multiple input multiple output (MIMO) systems. The proposed algorithms combine the conventional cost function of SVR with error functions of classical on-line algorithm for blind equalization: both error functions of constant modulus algorithm (CMA) and radius directed algorithm (RDA) are contained in the penalty term of SVR. To recover all sources simultaneously, the cross-correlations of equalizer outputs are included in the cost functions. Simulation experiments show that the proposed algorithms can recover all sources successfully and compensate channel distortion simultaneously. With the use of iterative re-weighted least square (IRWLS) solution of SVR, the proposed algorithms exhibit low computational complexity. Compared with traditional algorithms, the new algorithms only require fewer samples to achieve convergence and perform a lower residual interference. For multilevel signals, the single algorithms based on constant modulus property usually show a relatively high residual error, then we propose two dual-mode blind source separation and equalization schemes. Between them, the dual-mode scheme based on SVR merely requires fewer samples to achieve convergence and further reduces the residual interference.
An online nonnegative matrix factorization (NMF) algorithm based on recursive least squares (RLS) is described in a matrix form, and a simplified algorithm for a low-complexity calculation is developed for frame-by-frame online audio source separation system. First, the online NMF algorithm based on the RLS method is described as solving the NMF problem recursively. Next, a simplified algorithm is developed to approximate the RLS-based online NMF algorithm with low complexity. The proposed algorithm is evaluated in terms of audio source separation, and the results show that the performance of the proposed algorithms are superior to that of the conventional online NMF algorithm with significantly reduced complexity.
Zongli RUAN Ping WEI Guobing QIAN Hongshu LIAO
The information maximization (Infomax) based on information entropy theory is a class of methods that can be used to blindly separate the sources. Torkkola applied the Infomax criterion to blindly separate the mixtures where the sources have been delayed with respect to each other. Compared to the frequency domain methods, this time domain method has simple adaptation rules and can be easily implemented. However, Torkkola's method works only in the real valued field. In this letter, the Infomax for blind separation of the delayed sources is extended to the complex case for processing of complex valued signals. Firstly, based on the gradient ascent the adaptation rules for the parameters of the unmixing network are derived and the steps of algorithm are given. Then, a measurement matrix is constructed to evaluate the separation performance. The results of computer experiment support the extended algorithm.
Minook KIM Tae-Jun LEE Hyung-Min PARK
This letter presents a two-stage method to extend the degenerate unmixing estimation technique (DUET) for reverberant speech separation. First, frequency-bin-wise attenuation and delay parameters are introduced and estimated by online update rules, to handle early reflections. Next, a mask reestimation algorithm based on the precedence effect is developed to detect and fix the errors on binary masks caused by late reflections. Experimental results demonstrate that the proposed method improves separation performance significantly.
Hiromitsu AWANO Hiroshi TSUTSUI Hiroyuki OCHI Takashi SATO
Random telegraph noise (RTN) is a phenomenon that is considered to limit the reliability and performance of circuits using advanced devices. The time constants of carrier capture and emission and the associated change in the threshold voltage are important parameters commonly included in various models, but their extraction from time-domain observations has been a difficult task. In this study, we propose a statistical method for simultaneously estimating interrelated parameters: the time constants and magnitude of the threshold voltage shift. Our method is based on a graphical network representation, and the parameters are estimated using the Markov chain Monte Carlo method. Experimental application of the proposed method to synthetic and measured time-domain RTN signals was successful. The proposed method can handle interrelated parameters of multiple traps and thereby contributes to the construction of more accurate RTN models.
Sang Ha PARK Seokjin LEE Koeng-Mo SUNG
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Seokjin LEE Sang Ha PARK Koeng-Mo SUNG
In this paper, a geometric source separation system using nonnegative matrix factorization (NMF) is proposed. The adaptive beamformer is the best method for geometric source separation, but it suffers from a “target signal cancellation” problem in multi-path situations. We modified the HALS-NMF algorithm for decomposition into bases, and developed an interference suppression module in order to cancel the interference bases. A performance comparison between the proposed and subband GSC-RLS algorithm using a MATLAB® simulation was executed; the results show that the proposed system is robust in multi-path situations.
Navid TAFAGHODI KHAJAVI Siavash SADEGHI IVRIGH Seyed Mohammad-Sajad SADOUGH
Cognitive radio (CR) is a key solution for the problem of inefficient usage of spectral resources. Spectrum sensing in each CR aims at detecting whether a preassigned spectrum band is occupied by a primary user or not. Conventional techniques do not allow the CR to communicate with its own base station during the spectrum sensing process. So, only a part of the frame can be used for cognitive data transmission. In this paper, we introduce a new spectrum sensing framework that combines a blind source separation technique with conventional spectrum sensing techniques. In this way, the cognitive transmitter can continue to transmit during spectrum sensing, if it was in operation in the previous frame. Moreover, the accuracy is improved since the decision made by the spectrum unit in each frame depends on the decision made in the previous frame. We use Markov chain tools to model the behavior of our spectrum sensing proposal and to derive the parameters that characterize its performance. Numerical results are provided to confirm the superiority of the proposed technique compared to conventional spectrum sensing techniques.
Kenta NIWA Takanori NISHINO Kazuya TAKEDA
A sound field reproduction method is proposed that uses blind source separation and a head-related transfer function. In the proposed system, multichannel acoustic signals captured at distant microphones are decomposed to a set of location/signal pairs of virtual sound sources based on frequency-domain independent component analysis. After estimating the locations and the signals of the virtual sources by convolving the controlled acoustic transfer functions with each signal, the spatial sound is constructed at the selected point. In experiments, a sound field made by six sound sources is captured using 48 distant microphones and decomposed into sets of virtual sound sources. Since subjective evaluation shows no significant difference between natural and reconstructed sound when six virtual sources and are used, the effectiveness of the decomposing algorithm as well as the virtual source representation are confirmed.
Keiichi OSAKO Yoshimitsu MORI Yu TAKAHASHI Hiroshi SARUWATARI Kiyohiro SHIKANO
We propose a new algorithm for the blind source separation (BSS) approach in which independent component analysis (ICA) and frequency subband beamforming interpolation are combined. The slow convergence of the optimization of the separation filters is a problem in ICA. Our approach to resolving this problem is based on the relationship between ICA and null beamforming (NBF). The proposed method consists of the following three parts: (I) a frequency subband selector part for learning ICA, (II) a frequency domain ICA part with direction-of-arrivals (DOA) estimation of sound sources, and (III) an interpolation part in which null beamforming constructed with the estimated DOA is used. The results of the signal separation experiments under a reverberant condition reveal that the convergence speed is superior to that of the conventional ICA-based BSS methods.
Yalan YE Zhi-Lin ZHANG Jia CHEN
Fetal electrocardiogram (FECG) extraction is of vital importance in biomedical signal processing. A promising approach is blind source extraction (BSE) emerging from the neural network fields, which is generally implemented in a semi-blind way. In this paper, we propose a robust extraction algorithm that can extract the clear FECG as the first extracted signal. The algorithm exploits the fact that the FECG signal's kurtosis value lies in a specific range, while the kurtosis values of other unwanted signals do not belong to this range. Moreover, the algorithm is very robust to outliers and its robustness is theoretically analyzed and is confirmed by simulation. In addition, the algorithm can work well in some adverse situations when the kurtosis values of some source signals are very close to each other. The above reasons mean that the algorithm is an appealing method which obtains an accurate and reliable FECG.
Yijing CHU Heping DING Xiaojun QIU
Assuming there are short time periods in which only one source is active, a new approach for source separation is proposed. An affine projection adaptation algorithm with a non-orthogonal constraint shows excellent noise immunity, a high convergence rate, and good tracking capability to efficiently obtain a solution to the separation filters.
Akihide HORITA Kenji NAKAYAMA Akihiro HIRANO
FeedForward (FF-) Blind Source Separation (BSS) systems have some degree of freedom in the solution space. Therefore, signal distortion is likely to occur. First, a criterion for the signal distortion is discussed. Properties of conventional methods proposed to suppress the signal distortion are analyzed. Next, a general condition for complete separation and distortion-free is derived for multi-channel FF-BSS systems. This condition is incorporated in learning algorithms as a distortion-free constraint. Computer simulations using speech signals and stationary colored signals are performed for the conventional methods and for the new learning algorithms employing the proposed distortion-free constraint. The proposed method can well suppress signal distortion, while maintaining a high source separation performance.
Mohammad E. HAMID Takeshi FUKABAYASHI
A time domain (TD) speech enhancement technique to improve SNR in noise-contaminated speech is proposed. Additional supplementary scheme is applied to estimate the degree of noise of noisy speech. This is estimated from a function, which is previously prepared as the function of the parameter of the degree of noise. The function is obtained by least square (LS) method using the given degree of noise and the estimated parameter of the degree of noise. This parameter is obtained from the autocorrelation function (ACF) on frame-by-frame basis. This estimator almost accurately estimates the degree of noise and it is useful to reduce noise. The proposed method is based on two-stage processing. In the first stage, subtraction in time domain (STD), which is equivalent to ordinary spectral subtraction (SS), is carried out. In the result, the noise is reduced to a certain level. Further reduction of noise and by-product noise residual is carried out in the second stage, where blind source separation (BSS) technique is applied in time domain. Because the method is a single-channel speech enhancement, the other signal is generated by taking the noise characteristics into consideration in order to apply BSS. The generated signal plays a very important role in BSS. This paper presents an adaptive algorithm for separating sources in convolutive mixtures modeled by finite impulse response (FIR) filters. The coefficients of the FIR filter are estimated from the decorrelation of two mixtures. Here we are recovering only one signal of interest, in particular the voice of primary speaker free from interfering noises. In the experiment, the different levels of noise are added to the clean speech signal and the improvement of SNR at each stage is investigated. The noise types considered initially in this study consist of the synthesized white and color noise with SNR set from 0 to 30 dB. The proposed method is also tested with other real-world noises. The results show that the satisfactory SNR improvement is attained in the two-stage processing.
Nuo ZHANG Jianming LU Takashi YAHAGI
In this study, we propose a robust approach for blind source separation (BSS) by using radial basis function networks (RBFNs) and higher-order statistics (HOS). The RBFN is employed to estimate the inverse of a hypothetical complicated mixing procedure. It transforms the observed signals into high-dimensional space, in which one can simply separate the transformed signals by using a cost function. Recently, Tan et al. proposed a nonlinear BSS method, in which higher-order moments between source signals and observations are matched in the cost function. However, it has a strict restriction that it requires the higher-order statistics of sources to be known. We propose a cost function that consists of higher-order cumulants and the second-order moment of signals to remove the constraint. The proposed approach has the capacity of not only recovering the complicated mixed signals, but also reducing noise from observed signals. Simulation results demonstrate the validity of the proposed approach. Moreover, a result of application to X-ray image separation also shows its practical applicability.
Md. Khademul Islam MOLLA Keikichi HIROSE Nobuaki MINEMATSU
The Hilbert transformation together with empirical mode decomposition (EMD) produces Hilbert spectrum (HS) which is a fine-resolution time-frequency representation of any nonlinear and non-stationary signal. The EMD decomposes the mixture signal into some oscillatory components each one is called intrinsic mode function (IMF). Some modification of the conventional EMD is proposed here. The instantaneous frequency of every real valued IMF component is computed with Hilbert transformation. The HS is constructed by arranging the instantaneous frequency spectra of IMF components. The HS of the mixture signal is decomposed into subspaces corresponding to the component sources. The decomposition is performed by applying independent component analysis (ICA) and Kulback-Leibler divergence based K-means clustering on the selected number of bases derived from HS of the mixture. The time domain source signals are assembled by applying some post processing on the subspaces. We have produced experimental results using the proposed separation technique.