Jae-Seong LEE Chang-Joon LEE Young-Cheol PARK Dae-Hee YOUN
This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.
As the need for underwater communication has recently grown, an acoustic modem has become more necessary for the sensor nodes to perform effective underwater communication. To develop acoustic modems for effective underwater communication, some limitations must be overcome, such as the limited power supply and high cost of commercial acoustic modems. Recently, low-power, low-cost acoustic modems have been developed. However, the data rates of these modems are very slow. The objective of this work is to develop an acoustic modem capable of supporting high data rates. We introduce a coherent acoustic modem that uses waterproof ultrasonic sensors to process acoustic waves. The proposed modem is based on a low-power, low-cost, short-range concept, and it also supports a high data rate as confirmed by underwater experiments. Experimental results show that our modem has the best performance among all recently developed low-power modems.
Rattapol THOONSAENGNGAM Nisachon TANGSANGIUMVISAI
This paper proposes an enhanced method for estimating the a priori Signal-to-Disturbance Ratio (SDR) to be employed in the Acoustic Echo and Noise Suppression (AENS) system for full-duplex hands-free communications. The proposed a priori SDR estimation technique is modified based upon the Two-Step Noise Reduction (TSNR) algorithm to suppress the background noise while preserving speech spectral components. In addition, a practical approach to determine accurately the Echo Spectrum Variance (ESV) is presented based upon the linear relationship assumption between the power spectrum of far-end speech and acoustic echo signals. The ESV estimation technique is then employed to alleviate the acoustic echo problem. The performance of the AENS system that employs these two proposed estimation techniques is evaluated through the Echo Attenuation (EA), Noise Attenuation (NA), and two speech distortion measures. Simulation results based upon real speech signals guarantee that our improved AENS system is able to mitigate efficiently the problem of acoustic echo and background noise, while preserving the speech quality and speech intelligibility.
This letter proposes a windowing frequency domain adaptive algorithm, which reuses the filtering error to apply window function in the filter updating symmetrically. By using a proper window function to reduce the negative influence of the spectral leakage, the proposed algorithm can significantly improve the performance of the acoustic echo cancellation for speech signals.
Yoonjae LEE Kihyeon KIM Jongsung YOON Hanseok KO
A simple and novel residual acoustic echo cancellation method that employs binary masking is proposed to enhance the speech quality of hands-free communication in an automobile environment. In general, the W-disjoint orthogonality assumption is used for blind source separation using multi-microphones. However, in this Letter, it is utilized to mask the residual echo component in the time-frequency domain using a single microphone. The experimental results confirm the effectiveness of the proposed method in terms of the echo return loss enhancement and speech enhancement.
In this letter, an acoustic environment classification algorithm based on the 3GPP2 selectable mode vocoder (SMV) is proposed for context-aware mobile phones. Classification of the acoustic environment is performed based on a Gaussian mixture model (GMM) using coding parameters of the SMV extracted directly from the encoding process of the acoustic input data in the mobile phone. Experimental results show that the proposed environment classification algorithm provides superior performance over a conventional method in various acoustic environments.
Karthik MURALIDHAR Kwok Hung LI Sapna GEORGE
To attain good performance in an acoustic echo cancellation system, it is important to have a variable step size (VSS) algorithm as part of an adaptive filter. In this paper, we are concerned with the development of a VSS algorithm for a recently proposed subband affine projection (SAP) adaptive filter. Two popular VSS algorithms in the literature are the methods of delayed coefficients (DC) and variable regularization (VR). However, the merits and demerits of them are mutually exclusive. We propose a VSS algorithm that is a hybrid of both methods and combines their advantages. An extensive study of the new algorithm in different scenarios like the presence double-talk (DT) during the transient phase of the adaptive filter, DT during steady state, and varying DT power is conducted and reasoning is given to support the observed behavior. The importance of the method of VR as part of a VSS algorithm is emphasized.
This Letter proposes an optimal gain filter for the perceptual acoustic echo suppressor. We designed an optimally-modified log-spectral amplitude estimation algorithm for the gain filter in order to achieve robust suppression of echo and noise. A new parameter including information about interferences (echo and noise) of single-talk duration is statistically analyzed, and then the speech absence probability and the a posteriori SNR are judiciously estimated to determine the optimal solution. The experiments show that the proposed gain filter attains a significantly improved reduction of echo and noise with less speech distortion.
Kensaku FUJII Ryo AOKI Mitsuji MUNEYASU
This paper proposes an adaptive algorithm for identifying unknown systems containing nonlinear amplitude characteristics. Usually, the nonlinearity is so small as to be negligible. However, in low cost systems, such as acoustic echo canceller using a small loudspeaker, the nonlinearity deteriorates the performance of the identification. Several methods preventing the deterioration, polynomial or Volterra series approximations, have been hence proposed and studied. However, the conventional methods require high processing cost. In this paper, we propose a method approximating the nonlinear characteristics with a piecewise linear curve and show using computer simulations that the performance can be extremely improved. The proposed method can also reduce the processing cost to only about twice that of the linear adaptive filter system.
Kenta NIWA Takanori NISHINO Kazuya TAKEDA
A sound field reproduction method is proposed that uses blind source separation and a head-related transfer function. In the proposed system, multichannel acoustic signals captured at distant microphones are decomposed to a set of location/signal pairs of virtual sound sources based on frequency-domain independent component analysis. After estimating the locations and the signals of the virtual sources by convolving the controlled acoustic transfer functions with each signal, the spatial sound is constructed at the selected point. In experiments, a sound field made by six sound sources is captured using 48 distant microphones and decomposed into sets of virtual sound sources. Since subjective evaluation shows no significant difference between natural and reconstructed sound when six virtual sources and are used, the effectiveness of the decomposing algorithm as well as the virtual source representation are confirmed.
Hyunho KANG Koutarou YAMAGUCHI Brian KURKOSKI Kazuhiko YAMAGUCHI Kingo KOBAYASHI
For the digital watermarking patchwork algorithm originally given by Bender et al., this paper proposes two improvements applicable to audio watermarking. First, the watermark embedding strength is psychoacoustically adapted, using the Bark frequency scale. Second, whereas previous approaches leave the samples that do not correspond to the data untouched, in this paper, these are modified to reduce the probability of misdetection, a method called full index embedding. In simulations, the proposed combination of these two proposed methods has higher resistance to a variety of attacks than prior algorithms.
Masahito TOGAMI Yasunari OBUCHI
We propose a new methodology of DOA (direction of arrival) estimation named SPIRE (Stepwise Phase dIfference REstoration) that is able to estimate sound source directions even if there is more than one source in a reverberant environment. DOA estimation in reverberant environments is difficult because the variance of the direction of an estimated sound source increases in reverberant environments. Therefore, we want the distance between microphones to be long. However, because of the spatial aliasing problem, the distance cannot be longer than half the wavelength of the maximum frequency of a source. DOA estimation performance of SPIRE is not limited by the spatial aliasing problem. The major feature of SPIRE is restoration of the phase difference of a microphone pair (M1) by using the phase difference of another microphone pair (M2) under the condition that the distance between the M1 microphones is longer than the distance between the M2 microphones. This restoration process enables the reduction of the variance of an estimated sound source direction and can alleviates the spatial aliasing problem that occurs with the M1 phase difference using direction estimation of the M2 microphones. The experimental results in a reverberant environment (reverberation time = about 300 ms) indicate that even when there are multiple sources, the proposed method can estimate the source direction more accurately than conventional methods. In addition, DOA estimation performance of SPIRE with the array length 0.2 m is shown to be almost equivalent to that of GCC-PHAT with the array length 0.5 m. SPIRE can executes DOA estimation with a smaller microphone array than GCC-PHAT. From the viewpoint of the hardware size and coherence problem, the array length is required to be as small as possible. This feature of SPIRE is preferable.
Yoonjae LEE Seokyeong JEONG Hanseok KO
A residual acoustic echo cancellation method that employs the masking property is proposed to enhance the speech quality of hands-free communication devices in an automobile environment. The conventional masking property is employed for speech enhancement using the masking threshold of the desired clean speech signal. In this Letter, either the near-end speech or residual noise is selected as the desired signal according to the double-talk detector. Then, the residual echo signal is masked by the desired signal (masker). Experiments confirm the effectiveness of the proposed method by deriving the echo return loss enhancement and by examining speech waveforms and spectrograms.
Yusuke NAKASHIMA Hosei MATSUOKA Takeshi YOSHIMURA Hiroshi MIURA Seiichi NAKAJIMA Masanori MACHIDA Gen-ichiro OHTA
Data transmission via audio link on AM radio system is shown to be achievable by using Acoustic OFDM. We employ Acoustic OFDM to embed data onto audio contents that are then broadcast as AM radio signals. We tuned the parameters, and performed experiments. Text data as URL can be delivered to mobile phone through existing MF AM radio system and radios.
Umut YUNUS Masaru TSUNASAKI Yiwei HE Masanobu Kominami Katsumi YAMASHITA
Gas or water leaks in pipes that are buried under ground or that are situated in the walls of buildings may occur due to aging or unpredictable accidents, such as earthquakes. Therefore, the detection of leaks in pipes is an important task and has been investigated extensively. In the present paper, we propose a novel leak detection method by means of acoustic wave. We inject an acoustic chirp signal into a target pipeline and then estimate the leak location from the delay time of the compressed pulse by passing the reflected signal through a correlator. In order to distinguish a leak reflection in a complicated pipeline arrangement, the reflection characteristics of leaks are carefully discussed by numerical simulations and experiments. There is a remarkable difference in the reflection characteristics between the leak and other types of discontinuity, and the property can be utilized to distinguish the leak reflection. The experimental results show that, even in a complicated pipe arrangement including bends and branches, the proposed approach can successfully implement the leak detection. Furthermore, the proposed approach has low cost and is easy to implement because only a personal computer and some commonly equipment are required.
Hosei MATSUOKA Yusuke NAKASHIMA Takeshi YOSHIMURA
This paper presents a technology for short-range communications using sound wave, in which the modulated data signal can be transmitted in parallel with regular audio without significantly degrading the quality of the sound. The technology, which we call Acoustic OFDM, replaces the high frequency band of the audio signal with OFDM carriers, each of which is power-controlled according to the spectrum envelope of the original audio signal. It can provide data transmission of several hundreds bps. The implemented Acoustic OFDM system enables the transmission of short text messages from loud speakers to mobile devices at a distance of around 3 m.
Suehiro SHIMAUCHI Yoichi HANEDA Akitoshi KATAOKA
We propose a new robust frequency domain acoustic echo cancellation filter that employs a normalized residual echo enhancement. By interpreting the conventional robust step-size control approaches as a statistical-model-based residual echo enhancement problem, the optimal step-size introduced in the most of conventional approaches is regarded as optimal only on the assumption that both the residual echo and the outlier in the error output signal are described by Gaussian distributions. However, the Gaussian-Gaussian mixture assumption does not always hold well, especially when both the residual echo and the outlier are speech signals (known as a double-talk situation). The proposed filtering scheme is based on the Gaussian-Laplacian mixture assumption for the signals normalized by the reference input signal amplitude. By comparing the performances of the proposed and conventional approaches through the simulations, we show that the Gaussian-Laplacian mixture assumption for the normalized signals can provide a better control scheme for the acoustic echo cancellation.
Satoshi OHTA Yoshinobu KAJIKAWA Yasuo NOMURA
In the acoustic echo canceller (AEC), the step-size parameter of the adaptive filter must be varied according to the situation if double talk occurs and/or the echo path changes. We propose an AEC that uses a sub-adaptive filter. The proposed AEC can control the step-size parameter according to the situation. Moreover, it offers superior convergence compared to the conventional AEC even when the double talk and the echo path change occur simultaneously. Simulations demonstrate that the proposed AEC can achieve higher ERLE and faster convergence than the conventional AEC. The computational complexity of the proposed AEC can be reduced by reducing the number of taps of the sub-adaptive filter.
Goshu NAGINO Makoto SHOZAKAI Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO
This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.
Noriaki MURAKOSHI Akinori NISHIHARA
This paper presents a novel stereophonic acoustic echo canceling scheme without preprocessing. To accurately estimate echo path keeping the high level of performance in echo erasing, this scheme uses two filters, of which one filter is utilized as a guideline which does not erases echo but helps updating of the other filter, which actually erases echo. In addition, we propose a new filter dividing technique to apply to the filter divide scheme, and utilize this as the guideline. Numerical examples demonstrate that the proposed scheme improves the convergence behavior compared to conventional methods both in system mismatch (i.e., normalized coefficients error) and Echo Return Loss Enhancement (ERLE).