Arata KAWAMURA Hiro IGARASHI Youji IIGUNI
Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.
Yuta TSUKAMOTO Arata KAWAMURA Youji IIGUNI
In this paper, a novel speech enhancement algorithm based on the MAP estimation is proposed. The proposed speech enhancer adaptively changes the speech spectral density used in the MAP estimation according to the sum of the observed power spectra. In a speech segment, the speech spectral density approaches to Rayleigh distribution to keep the quality of the enhanced speech. While in a non-speech segment, it approaches to an exponential distribution to reduce noise effectively. Furthermore, when the noise is super-Gaussian, we modify the width of Gaussian so that the Gaussian model with the modified width approximates the distribution of the super-Gaussian noise. This technique is effective in suppressing residual noise well. From computer experiments, we confirm the effectiveness of the proposed method.
Yuya HOSODA Arata KAWAMURA Youji IIGUNI
In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.
Sayuri KOHMURA Arata KAWAMURA Youji IIGUNI
This paper proposes a noise reduction method for impact noise with damped oscillation caused by clinking a glass, hitting a bottle, and so on. The proposed method is based on the zero phase (ZP) signal defined as the IDFT of the spectral amplitude. When the target noise can be modeled as the sum of the impact part and the damped oscillation part, the proposed method can reduce them individually. First, the proposed method estimates the damped oscillation spectra and subtracts them from the observed spectra. Then, the impact part is reduced by replacing several samples of the ZP observed signal. Simulation results show that the proposed method improved 10dB of SNR of real impact noise.
Weerawut THANHIKAM Arata KAWAMURA Youji IIGUNI
In this paper, we propose a speech enhancement algorithm by using MAP estimation with variable speech spectral amplitude probability density function (speech PDF). The variable speech PDF has two adaptive shape parameters which affect the quality of enhanced speech. Noise can be efficiently suppressed when these parameters are properly applied so that the variable speech PDF shape fits to the real-speech PDF one. We derive adaptive shape parameters from real-speech PDF in various narrow SNR intervals. The proposed speech enhancement algorithm with adaptive shape parameters is examined and compared to conventional algorithms. The simulation results show that the proposed method improved SegSNR around 6 and 9 dB when the input speech signal was corrupted by white and tunnel noises at 0 dB, respectively.
Weerawut THANHIKAM Yuki KAMAMORI Arata KAWAMURA Youji IIGUNI
This paper proposes a wide-band noise reduction method using a zero phase (ZP) signal which is defined as the IDFT of a spectral amplitude. When a speech signal has periodicity in a short observation, the corresponding ZP signal becomes also periodic. On the other hand, when a noise spectral amplitude is approximately flat, its ZP signal takes nonzero values only around the origin. Hence, when a periodic speech signal is embedded in a flat spectral noise in an analysis frame, its ZP signal becomes a periodic signal except around the origin. In the proposed noise reduction method, we replace the ZP signal around the origin with the ZP signal in the second or latter period. Then, we get an estimated speech ZP signal. The major advantages of this method are that it can reduce not only stationary wide-band noises but also non-stationary wide-band noises and does not require a prior estimation of the noise spectral amplitude. Simulation results show that the proposed noise reduction method improves the SNR more than 5 dB for a tunnel noise and 13 dB for a clap noise in a low SNR environment.
Yuki SATOMI Arata KAWAMURA Youji IIGUNI
For an adaptive system identification filter with a stochastic input signal, a coefficient vector updated with an NLMS algorithm converges in the sense of ensemble average and the expected convergence vector has been revealed. When the input signal is periodic, the convergence of the adaptive filter coefficients has also been proved. However, its convergence vector has not been revealed. In this paper, we derive the convergence vector of adaptive filter coefficients updated with the NLMS algorithm in system identification for deterministic sinusoidal inputs. Firstly, we derive the convergence vector when a disturbance does not exist. We show that the derived convergence vector depends only on the initial vector and the sinusoidal frequencies, and it is independent of the step-size for adaptation, sinusoidal amplitudes, and phases. Next, we derive the expected convergence vector when the disturbance exists. Simulation results support the validity of the derived convergence vectors.
Arata KAWAMURA Youji IIGUNI Yoshio ITOH
A parallel notch filter (PNF) for eliminating a sinusoidal signal whose frequency and phase are unknown, has been proposed previously. The PNF achieves both fast convergence and high estimation accuracy when the step-size for adaptation is appropriately determined. However, there has been no discussion of how to determine the appropriate step-size. In this paper, we derive the convergence condition on the step-size, and propose an adaptive algorithm with variable step-size so that convergence of the PNF is automatically satisfied. Moreover, we present a new filtering structure of the PNF that increases the convergence speed while keeping the estimation accuracy. We also derive a variable step-size scheme for the new PNF to guarantee the convergence. Simulation results show the effectiveness of the proposed method.
Kazuhiro MURAKAMI Arata KAWAMURA Yoh-ichi FUJISAKA Nobuhiko HIRUMA Youji IIGUNI
In this paper, we propose a real-time BSS (Blind Source Separation) system with two microphones that extracts only desired sound sources. Under the assumption that the desired sound sources are close to the microphones, the proposed BSS system suppresses distant sound sources as undesired sound sources. We previously developed a BSS system that can estimate the distance from a microphone to a sound source and suppress distant sound sources, but it was not a real-time processing system. The proposed BSS system is a real-time version of our previous BSS system. To develop the proposed BSS system, we simplify some BSS procedures of the previous system. Simulation results showed that the proposed system can effectively suppress the distant source signals in real-time and has almost the same capability as the previous system.
Yosuke SUGIURA Arata KAWAMURA Youji IIGUNI
This paper proposes an adaptive comb filter with flexible notch gain. It can appropriately remove a periodic noise from an observed signal. The proposed adaptive comb filter uses a simple LMS algorithm to update the notch gain coefficient for removing the noise and preserving a desired signal, simultaneously. Simulation results show the effectiveness of the proposed comb filter.
Hiroshi Ochi and Arata KAWAMURA
Naoto SASAOKA Keisuke SUMI Yoshio ITOH Kensaku FUJII Arata KAWAMURA
A noise reduction technique to reduce wideband and sinusoidal noise in a noisy speech is proposed. In an actual environment, background noise includes not only wideband noise but also sinusoidal noise, such as ventilation fan and engine noise. In this paper, we propose a new noise reduction system which uses two types of adaptive line enhancers (ALE) and a noise estimation filter (NEF). First, the two ALEs are used to estimate speech components. The first ALE is used to reduce sinusoidal noise superposed on speech and wideband noise, while the second ALE is used to reduce wideband noise superposed on speech. However, since the quality of the speech enhanced by two ALEs is not good enough due to the difficulty in estimating unvoiced sound using the two ALEs, the NEF is used to improve on noise reduction capability. The NEF accurately estimates the background noise from the signal occupied by noise components, which is obtained by subtracting the speech enhanced by two ALEs from noisy speech. The enhanced speech is obtained by subtracting the estimated noise from noisy speech. Furthermore, the noise reduction system with feedback path is proposed to improve further the quality of enhanced speech.
Yosuke SUGIURA Arata KAWAMURA Youji IIGUNI
This paper proposes a new adaptive comb filter which automatically designs its characteristics. The comb filter is used to eliminate a periodic noise from an observed signal. To design the comb filter, there exists three important factors which are so-called notch frequency, notch gain, and notch bandwidth. The notch frequency is the null frequency which is aligned at equally spaced frequencies. The notch gain controls an elimination quantity of the observed signal at notch frequencies. The notch bandwidth controls an elimination bandwidth of the observed signal at notch frequencies. We have previously proposed a comb filter which can adjust the notch gain adaptively to eliminate the periodic noise. In this paper, to eliminate the periodic noise when its frequencies fluctuate, we propose the comb filter which achieves the adaptive notch gain and the adaptive notch bandwidth, simultaneously. Simulation results show the effectiveness of the proposed adaptive comb filter.
Arata KAWAMURA Kensaku FUJII Yoshio ITOH Yutaka FUKUI
A technique that uses a linear prediction error filter (LPEF) and an adaptive digital filter (ADF) to achieve noise reduction in a speech degraded by additive background noise is proposed. It is known that the coefficients of the LPEF converge such that the prediction error signal becomes white. Since a voiced speech can be represented as the stationary periodic signal over a short interval of time, most of voiced speech cannot be included in the prediction error signal of the LPEF. On the other hand, when the input signal of the LPEF is a background noise, the prediction error signal becomes white. Assuming that the background noise is represented as generate by exciting a linear system with a white noise, then we can reconstruct the background noise from the prediction error signal by estimating the transfer function of noise generation system. This estimation is performed by the ADF which is used as system identification. Noise reduction is achieved by subtracting the noise reconstructed by the ADF from the speech degraded by additive background noise.
Akira SOGAMI Arata KAWAMURA Youji IIGUNI
We have previously proposed a howling canceller which cancels howling by using a cascade notch filter designed from a distance between a loudspeaker and a microphone. This method utilizes a pilot signal to estimate the distance. In this paper, we introduce two methods into the distance-based howling canceller to improve speech quality. The first one is an adaptive cascade notch filter which adaptively adjusts the nulls to eliminate howling and to keep speech components. The second one is a silent pilot signal whose frequencies exist in the ultrasonic band, and it is inaudible while on transmission. We implement the proposed howling canceller on a DSP to evaluate its capability. The experimental results show that the proposed howling canceller improves speech quality in comparison to the conventional one.
Naoto SASAOKA Eiji AKAMATSU Arata KAWAMURA Noboru HAYASAKA Yoshio ITOH
Speech enhancement has been proposed to reduce the impulsive noise whose frequency characteristic is wideband. On the other hand, it is challenging to reduce the ringing sound, which is narrowband in impulsive noise. Therefore, we propose the modeling of the ringing sound and its estimation by a linear predictor (LP). However, it is difficult to estimate the ringing sound only in noisy speech due to the auto-correlation property of speech. The proposed system adopts the 4th order moment-based adaptive algorithm by noticing the difference between the 4th order statistics of speech and impulsive noise. The brief analysis and simulation results show that the proposed system has the potential to reduce ringing sound while keeping the quality of enhanced speech.
Arata KAWAMURA Youji IIGUNI Yoshio ITOH
A noise reduction technique that uses the linear prediction to remove noise components in speech signals has been proposed previously. The noise reduction works well for additive white noise signals, because the coefficients of the linear predictor converge such that the prediction error becomes white. In this method, the linear predictor is updated by a gradient-based algorithm with a fixed step-size. However, the optimal value of the step-size changes with the values of the prediction coefficients. In this paper, we propose a noise reduction system using the linear predictor with a variable step-size. The optimal value of the step-size depends also on the variance of the white noise, however the variance is unknown. We therefore introduce a speech/non-speech detector, and estimate the variance in non-speech segments where the observed signal includes only noise components. The simulation results show that the noise reduction capability of the proposed system is better than that of the conventional one with a fixed step-size.
Akira SOGAMI Arata KAWAMURA Youji IIGUNI
In this paper, we propose a distance-based howling canceller with high speech quality. We have developed a distance-based howling canceller that uses only distance information by noticing the property that howling occurs according to the distance between a loudspeaker and a microphone. This method estimates the distance by transmitting a pilot signal from the loudspeaker to the microphone. Multiple frequency candidates for each howling are computed from the estimated distance and eliminated by cascading notch filters that have nulls at them. However degradation of speech quality occurs at the howling canceller output. The first cause is a shot noise occurrence at the beginning and end of the pilot signal transmission due to the discontinuous change of the amplitude. We thus develop a new pilot signal that is robust against ambient noises. We can then reduce the shot noise effect by taking the amplitude small. The second one is a speech degradation caused from overlapped stopbands of the notch filters. We thus derive a condition on the bandwidths so that stopbands do not overlap, and propose an adaptive bandwidth scheme which changes the bandwidth according to the distance.
Yuya HOSODA Arata KAWAMURA Youji IIGUNI
The narrow bandwidth limitation of 300-3400Hz on the public switching telephone network results in speech quality deterioration. In this paper, we propose an artificial bandwidth extension approach that reconstructs the missing lower bandwidth of 50-300Hz using sinusoidal synthesis based on the first formant location. Sinusoidal synthesis generates sinusoidal waves with a harmonic structure. The proposed method detects the fundamental frequency using an autocorrelation method based on YIN algorithm, where a threshold processing avoids the false fundamental frequency detection on unvoiced sounds. The amplitude of the sinusoidal waves is calculated in the time domain from the weighted energy of 300-600Hz. In this case, since the first formant location corresponds to the first peak of the spectral envelope, we reconstruct the harmonic structure to avoid attenuating and overemphasizing by increasing the weight when the first formant location is lower, and vice versa. Consequently, the subjective and objective evaluations show that the proposed method reduces the speech quality difference between the original speech signal and the bandwidth extended speech signal.
Yosuke SUGIURA Arata KAWAMURA Youji IIGUNI
This paper proposes a comb filter design method which utilizes two linear phase FIR filters for flexibly adjusting the comb filter's frequency response. The first FIR filter is used to individually adjust the notch gains, which denote the local minimum gains of the comb filter's frequency response. The second FIR filter is used to design the elimination bandwidths for individual notch gains. We also derive an efficient comb filter by incorporating these two FIR filters with an all-pass filter which is used in a conventional comb filter to accurately align the nulls with the undesired harmonic frequencies. Several design examples of the derived comb filter show the effectiveness of the proposed comb filter design method.