Ryo MUKAI Hiroshi SAWADA Shoko ARAKI Shoji MAKINO
This paper describes a real-time blind source separation (BSS) method for moving speech signals in a room. Our method employs frequency domain independent component analysis (ICA) using a blockwise batch algorithm in the first stage, and the separated signals are refined by postprocessing using crosstalk component estimation and non-stationary spectral subtraction in the second stage. The blockwise batch algorithm achieves better performance than an online algorithm when sources are fixed, and the postprocessing compensates for performance degradation caused by source movement. Experimental results using speech signals recorded in a real room show that the proposed method realizes robust real-time separation for moving sources. Our method is implemented on a standard PC and works in realtime.
Xiaomin WANG Daisuke KUNIMATSU Tatsushi HASEGAWA Akira SUZUKI
We demonstrate the wide-band (> 25-nm) long-distance (> 1000-km) chromatic dispersion compensation by midway spectral inversion (MSI) using a periodically-polled LiNbO3 device. In order to achieve a flat zero net dispersion, the fourth order dispersion of the single-mode fibers is canceled by MSI, while the third order dispersion is compensated for by the negative slope dispersion compensation fiber (NS-DCF). The second order dispersion is canceled out by both. The long distance propagation is realized by a double recirculation-loop system. A very flat zero dispersion is measured for the first time for over 1000-km single-mode fiber propagation with MSI dispersion compensation.
Yusuke HIWASAKI Kazunori MANO Kazutoshi YASUNAGA Toshiyuki MORII Hiroyuki EHARA Takao KANEKO
This paper presents an efficient LSP quantizer implementation for low bit-rate coders. The major feature of the quantizer is that it uses a truncated cepstral distance criterion for the code selection procedure. This approach has generally been considered too computationally costly. We utilized the quantizer with a moving-average predictor, two-stage-split vector quantizer and delayed decision. We have investigated the optimal parameter settings in this case and incorporated the quantizer thus obtained into an ITU-T 4-kbit/s speech coding candidate algorithm with a bit budget of 21 bits. The objective performance is better than that with a conventional weighted mean-square criterion, while the complexity is still kept to a reasonable level. The paper also describes the codebook design and techniques that were employed to achieve robustness in noisy channel conditions.
Hongseok KWON Jongmok SON Keunsung BAE
This paper describes a new speech enhancement system that employs a microphone array with post-processing based on minimum mean-square error short-time spectral amplitude (MMSE-STSA) estimator. To get more accurate MMSE-STSA estimator in a microphone array, modification and refinement procedure are carried out from each microphone output. Performance of the proposed system is compared with that of other methods using a microphone array. Noise removal experiments for white and pink noises demonstrate the superiority of the proposed speech enhancement system to others with a microphone array in average output SNRs and cepstral distance measures.
Rajkishore PRASAD Hiroshi SARUWATARI Kiyohiro SHIKANO
This paper deals with the statistical modeling of a Time-Frequency Series of Speech (TFSS), obtained by Short-Time Fourier Transform (STFT) analysis of the speech signal picked up by a linear microphone array with two elements. We have attempted to find closer match between the distribution of the TFSS and theoretical distributions like Laplacian Distribution (LD), Gaussian Distribution (GD) and Generalized Gaussian Distribution (GGD) with parameters estimated from the TFSS data. It has been found that GGD provides the best models for real part, imaginary part and polar magnitudes of the time-series of the spectral components. The distribution of the polar magnitude is closer to LD than that of the real and imaginary parts. The distributions of the real and imaginary parts of TFSS correspond to strongly LD. The phase of the TFSS has been found uniformly distributed. The use of GGD based model as PDF in the fixed-point Frequency Domain Independent Component Analysis (FDICA) provides better separation performance and improves convergence speed significantly.
We propose Optimal Temporal Decomposition (OTD) of speech for voice morphing preserving Δ cepstrum. OTD is an optimal modification of the original Temporal Decomposition (TD) by B. Atal. It is theoretically shown that OTD can achieve minimal spectral distortion for the TD-based approximation of time-varying LPC parameters. Moreover, by applying OTD to preserving Δ cepstrum, it is also theoretically shown that Δ cepstrum of a target speaker can be reflected to that of a source speaker. In frequency domain interpolation, the Laplacian Spectral Distortion (LSD) measure is introduced to improve the Inverse Function of Integrated Spectrum (IFIS) based non-uniform frequency warping. Experimental results indicate that Δ cepstrum of the OTD-based morphing spectra of a source speaker is mostly equal to that of a target speaker except for a piecewise constant factor and subjective listening tests show that the speech intelligibility of the proposed morphing method is superior to the conventional method.
Naoki KOBAYASHI Kaoru NARITA Taras KUSHTA Hirokazu TOHYA
We have developed an algorithm called the "spectral-domain-to-real-space approach" (SDRSA) to analytically calculate radiation from the two-dimensional current density distribution in microstrip line configurations where the microstrip lines are represented in the form of a three-dimensional inhomogeneous structure. The algorithm is based on the spectral-domain approach used to estimate radiation from microstrip line configurations. Calculation results obtained by using the SDRSA and the current density distribution from a quasi-TEM mode model of microstrip lines agree well with the corresponding estimations obtained by using the equivalent electric current source method and the magnetic current source method, and with the experimental results obtained in the frequency band of up to 1 GHz.
The adaptive cross-spectral (ACS) technique recently introduced by Okuno et al. provides an attractive solution to acoustic echo cancellation (AEC) as it does not require double-talk (DT) detection. In this paper, we first introduce a generalized ACS (GACS) technique where a step-size parameter is used to control the magnitude of the incremental correction applied to the coefficient vector of the adaptive filter. Based on the study of the effects of the step-size on the GACS convergence behaviour, a new variable step-size ACS (VSS-ACS) algorithm is proposed, where the value of the step-size is commanded dynamically by a special finite state machine. Furthermore, the proposed algorithm has a new adaptation scheme to improve the initial convergence rate when the network connection is created. Experimental results show that the new VSS-ACS algorithm outperforms the original ACS in terms of a higher acoustic echo attenuation during DT periods and faster convergence rate.
Takeshi SHIRAISHI Toshio NISHIKAWA Kikuo WAKINO Toshihide KITAZAWA
A novel hybrid numerical method, which is based on the extended spectral domain approach combined with the mode-matching method, is applied to evaluate the scattering parameter of waveguide discontinuities. The formulation procedure utilizes the biorthogonal relation in the transformation, and the Green's functions in the spectral domain are obtained easily even in the inhomogeneous lossy regions. The present method does not include the approximate perturbational scheme, and it can evaluate accurately and stably the scattering parameters of either for the thin or thick obstacles made of the wide variety of materials, the lossless dielectrics to highly conductive media, in short computation time. The physical phenomena of transmission through the lossy obstacles are investigated by numerical computations. The results are compared with FEM where FEM computations are feasible, although the FEM computations cannot cover the whole performances of the present method. The good agreement is observed in the corresponding range. The matrix size in this method is smaller than that of other methods. Therefore, the present method is numerically efficient and it would be able to apply for the integrated evaluation of a successive discontinuity. The resonant characteristics of rectangular waveguide cavity are analyzed accurately taking the conductor losses into consideration.
Byeong-Sook BAE Gi-Hong IM Yoon-Ha JEONG
In this paper, a simple adaptive notch filter (ANF) scheme for reducing RFI over CAP/QAM-based VDSL systems is proposed. To alleviate the spectral null caused by notch filtering, a null reshaping scheme is introduced between the normal ANF and the decision feedback equalizer (DFE). The proposed filter scheme can control the width and depth of the null. The shallow and narrow null obtained by null reshaping reduces the loss of signal components and consequently improves the mean square error (MSE) at the output of the equalizer. The proposed null reshaping scheme also enables the infinite impulse response (IIR) type constrained ANF to have a smaller pole contraction factor α. This results in a fast convergence property in RFI frequency estimation with a recursive prediction error (RPE) algorithm. The performance variations of the proposed null reshaping are investigated with varying filter parameters. Compared to the conventional ANF, simulation results show that, at the expense of small system complexity, the proposed structure yields a 2-3 dB MSE gain and a fast convergence property for RFI estimation.
Shouhao WU Wentao SONG Hanwen LUO
In this paper, a practical adaptive TuCM scheme is proposed, and its adaptive method is described. With some hardware considerations, a suboptimal optimization algorithm which shows that the number of fading regions is variable is put forward. The proposed adaptive TuCM comes within 3 dB of fading channel capacity, exhibits about 3 dB power gain over conventional adaptive TCM, and is easy to realize by hardware. Considering delay and channel estimation error, the BER performance of adaptive TuCM is analyzed and simulated. In the performance analysis, the method of data fitting is applied to obtain the BER expression for TuCM, and a fitting mathematical model is proposed. Results show that adaptive TuCM is very sensitive to delay and channel estimation error. To alleviate these problems, we proposed an improved power adaptation that can make adaptive TuCM practical.
Masaharu HYODO Masayoshi WATANABE
A new technique for optical generation of high-purity millimeter-wave (mm-wave) signals--namely, by synthesizing the outputs from cascadingly phase-locked multiple semiconductor lasers--was developed. Firstly, a high-spectral-purity mm-wave signal was optically generated by heterodyning the outputs from two phase-locked external-cavity semiconductor lasers. The beat signal was detected by a p-i-n photodiode whose output was directly coupled to a coax-waveguide converter followed by a W-band harmonic mixer. By constructing an optical phase-locked loop (OPLL), a high-spectral-purity mm-wave signal with an electrical power of 2.3 µW was successfully generated at 110 GHz with an rms phase fluctuation of 57 mrad. Secondly, the frequency of the mm-wave signal was extended by use of three cascadingly phase-locked semiconductor lasers. This technique uses a semiconductor optical amplifier (SOA) to generate four-wave-mixing (FWM) signals as well as to amplify the input signals. When the three lasers were appropriately tuned, two pairs of FWM signals were nearly degenerated. By phase-locking the offset frequency in one of the nearly degenerated pairs, the frequency separations among the three lasers were kept at a ratio of 1:2. Thus, we successfully generated high-purity millimeter-wave optical-beat signals at frequencies at 330.566 GHz with an rms phase fluctuation of 0.38 rad. A detailed analysis of the phase fluctuations was carried out on the basis of measured power spectral densities. The possibility of extending the mm-wave frequency up to 1 THz by using four cascadingly phase-locked lasers was also discussed.
In order to establish rapid diagnosing in TDX signaling service, it has developed a PCM signal acquisition (PCMA) system which can analyze status of signals sent from/received to a signaling equipment, providing the fully electronic switching system. The system has a function of acquirement PCM signal of the preferred channel from the subhighway (SHW), connecting a universal signal transceiver unit (USTU) and time switch unit (TSU), and then it classifies the type of signal such as R2MFC/DTMF/CCT/VOICE, and finally discriminates the digit. This paper analyze the signal status of the PCMA system using the quick Fourier transform (QFT) based the symmetric properties, and discusses the algorithm of signal analysis and discrimination. In the experimental results, it shows the improved performance to the PCMA and reduce memory waste and process the real-time.
Tae-Su KIM Bong-Seok KIM Seung-Jin KIM Byung-Ju KIM Kyung-Nam PARK Kuhn-Il LEE
This paper proposes a new multispectral image data compression algorithm that can efficiently reduce spatial and spectral redundancies by applying classified prediction, a Karhunen-Loeve transform (KLT), and the three-dimensional set partitioning in hierarchical trees (3-D SPIHT) algorithm in the wavelet transform (WT) domain. The classification is performed in the WT domain to exploit the interband classified dependency, while the resulting class information is used for the interband prediction. The residual image data on the prediction errors between the original image data and the predicted image data is decorrelated by a KLT. Finally, the 3-D SPIHT algorithm is used to encode the transformed coefficients listed in a descending order spatially and spectrally as a result of the WT and KLT. Simulation results showed that the reconstructed images after using the proposed algorithm exhibited a better quality and higher compression ratio than those using conventional algorithms.
Wen-Chung LIU Gin-Kou MA Shiunn-Jang CHERN
In this paper, to enhance the power efficiency a new simple space-time coding scheme is devised with application to the OFDM based Wireless LAN system. The basic idea is from the receiver's point of view and is referred to as Virtual Constellation Mapping (VCM). We designed a new combination of the channel coding (Turbo Code) along with multiple transmit antennas (Two antennas) to achieve transmit diversity and space division multiplexing transmission. Computer simulation results showed that with the same transmission data rate, our proposed scheme can achieve better bit error rate (BER) compared with the conventional space-time trellis coded OFDM scheme in high Doppler fading channels.
Osamu ICHIKAWA Tetsuya TAKIGUCHI Masafumi NISHIMURA
It is believed that distant-talking speech recognition in a noisy environment requires a large-scale microphone array. However, this cannot fit into small consumer devices. Our objective is to improve the performance with a limited number of microphones (preferably only left and right). In this paper, we focused on a profile that is the shape of the power distribution according to the beamforming direction. An observed profile can be decomposed into known profiles for directional sound sources and a non-directional background sound source. Evaluations confirmed this method reduced the CER (Character Error Ratio) for the dictation task by more than 20% compared to a conventional 2-channel Adaptive Spectral Subtraction beamformer in a non-reverberant environment.
In this paper, a grey filtering approach based on GM(1,1) model is proposed. Then the grey filtering is applied to speech enhancement. The fundamental idea in the proposed grey filtering is to relate estimation error of GM(1,1) model to additive noise. The simulation results indicate that the additive noise can be estimated accurately by the proposed grey filtering approach with an appropriate scaling factor. Note that the spectral subtraction approach to speech enhancement is heavily dependent on the accuracy of statistics of additive noise and that the grey filtering is able to estimate additive noise appropriately. A magnitude spectral subtraction (MSS) approach for speech enhancement is proposed where the mechanism to determine the non-speech and speech portions is not required. Two examples are provided to justify the proposed MSS approach based on grey filtering. The simulation results show that the objective of speech enhancement has been achieved by the proposed MSS approach. Besides, the proposed MSS approach is compared with HFR-based approach in [4] and ZP approach in [5]. Simulation results indicate that in most of cases HFR-based and ZP approaches outperform the proposed MSS approach in SNRimp. However, the proposed MSS approach has better subjective listening quality than HFR-based and ZP approaches.
Kazuo ONOE Hiroyuki SEGI Takeshi KOBAYAKAWA Shoei SATO Shinichi HOMMA Toru IMAI Akio ANDO
In this paper, we propose a new technique of filter bank subtraction for robust speech recognition under various acoustic conditions. Spectral subtraction is a simple and useful technique for reducing the influence of additive noise. Conventional spectral subtraction assumes accurate estimation of the noise spectrum and no correlation between speech and noise. Those assumptions, however, are rarely satisfied in reality, leading to the degradation of speech recognition accuracy. Moreover, the recognition improvement attained by conventional methods is slight when the input SNR changes sharply. We propose a new method in which the output values of filter banks are used for noise estimation and subtraction. By estimating noise at each filter bank, instead of at each frequency point, the method alleviates the necessity for precise estimation of noise. We also take into consideration expected phase differences between the spectra of speech and noise in the subtraction and control a subtraction coefficient theoretically. Recognition experiments on test sets at several SNRs showed that the filter bank subtraction technique improved the word accuracy significantly and got better results than conventional spectral subtraction on all the test sets. In other experiments, on recognizing speech from TV news field reports with environmental noise, the proposed subtraction method yielded better results than the conventional method.
Hidetoshi NAKASHIMA Yoshifumi CHISAKI Tsuyoshi USAGAWA Masanao EBATA
This paper addresses the single channel speech enhancement method which utilizes the mean value and variance of the logarithmic noise power spectra. An important issue for single channel speech enhancement algorithm is to determine the trade-off point for the spectral distortion and residual noise. Thus the accurate discrimination between speech spectral and noise components is required. The conventional methods determine the trade-off point using parameters obtained experimentally. As a result spectral discrimination is not adequate. And the enhanced speech is deteriorated by spectral distortion or residual noise. Therefore, a criteria to determine the point is necessary. The proposed method determines the trade-off point of spectral distortion and residual noise level by discrimination between speech spectral and noise components based on statistical criteria. The spectral discrimination is performed using hypothesis testing that utilizes means and variances of the logarithmic power spectra. The discriminated spectral components are divided into speech-dominant spectral components and noise-dominant ones. For the speech-dominant ones, spectral subtraction is performed to minimize the spectral distortion. For the noise-dominant ones, attenuation is performed to reduce the noise level. The performance of the method is confirmed in terms of waveform, spectrogram, noise reduction level and speech recognition task. As a result, the noise reduction level and speech recognition rate are improved so that the method reduces the musical noise effectively and improves the enhanced speech quality.
Hiroshi HASEGAWA Yasuhiro MIKI Isao YAMADA Kohichi SAKANIWA
In this paper, we propose a novel higher order time-frequency distribution (GDH) for a discrete time signal. This distribution is defined over the original discrete time-frequency grids through a delicate discretization of an equivalent expression of a higher order distribution, for a continuous time signal, in [4]. We also present a constructive design method, for the kernel of the GDH, by which the distribution satisfies (i) the alias free condition as well as (ii) the marginal conditions. Numerical examples show that the proposed distributions reasonably suppress the artifacts which are observed severely in the Wigner distribution and its simple higher order generalization.