Yosuke SUGIURA Arata KAWAMURA Youji IIGUNI
This paper proposes a new adaptive comb filter which automatically designs its characteristics. The comb filter is used to eliminate a periodic noise from an observed signal. To design the comb filter, there exists three important factors which are so-called notch frequency, notch gain, and notch bandwidth. The notch frequency is the null frequency which is aligned at equally spaced frequencies. The notch gain controls an elimination quantity of the observed signal at notch frequencies. The notch bandwidth controls an elimination bandwidth of the observed signal at notch frequencies. We have previously proposed a comb filter which can adjust the notch gain adaptively to eliminate the periodic noise. In this paper, to eliminate the periodic noise when its frequencies fluctuate, we propose the comb filter which achieves the adaptive notch gain and the adaptive notch bandwidth, simultaneously. Simulation results show the effectiveness of the proposed adaptive comb filter.
Ivan SETIAWAN Youji IIGUNI Hajime MAEDA
In this paper, a new approach to adaptive direction-of-arrival (DOA) estimation based upon a database retrieval technique is proposed. In this method, angles and signal powers are quantized, and a set of true correlation vectors of the array antenna input vectors for various combinations of the quantized angles and signal powers is stored in a database. The k-d tree is then selected as the data structure to facilitate range searching. Estimated a correlation vector, range searching is performed to retrieve several correlation vectors close to it from the k-d tree. The DOA and the signal power are estimated by taking the weighted average of angles and powers associated with the retrieved correlation vectors. Unlike the other high-resolution methods, this method requires no eigenvalue computation, thus allowing a fast computation. It is shown through simulation results that the processing speed of the proposed method is much faster than that of the root-MUSIC that requires the eigenvalue decomposition.
Makoto NAKASHIZUKA Hidenari NISHIURA Youji IIGUNI
In this study, we introduce shift-invariant sparse image representations using tree-structured dictionaries. Sparse coding is a generative signal model that approximates signals by the linear combinations of atoms in a dictionary. Since a sparsity penalty is introduced during signal approximation and dictionary learning, the dictionary represents the primal structures of the signals. Under the shift-invariance constraint, the dictionary comprises translated structuring elements (SEs). The computational cost and number of atoms in the dictionary increase along with the increasing number of SEs. In this paper, we propose an algorithm for shift-invariant sparse image representation, in which SEs are learnt with a tree-structured approach. By using a tree-structured dictionary, we can reduce the computational cost of the image decomposition to the logarithmic order of the number of SEs. We also present the results of our experiments on the SE learning and the use of our algorithm in image recovery applications.
Akira SOGAMI Arata KAWAMURA Youji IIGUNI
We have previously proposed a howling canceller which cancels howling by using a cascade notch filter designed from a distance between a loudspeaker and a microphone. This method utilizes a pilot signal to estimate the distance. In this paper, we introduce two methods into the distance-based howling canceller to improve speech quality. The first one is an adaptive cascade notch filter which adaptively adjusts the nulls to eliminate howling and to keep speech components. The second one is a silent pilot signal whose frequencies exist in the ultrasonic band, and it is inaudible while on transmission. We implement the proposed howling canceller on a DSP to evaluate its capability. The experimental results show that the proposed howling canceller improves speech quality in comparison to the conventional one.
Arata KAWAMURA Youji IIGUNI Yoshio ITOH
A noise reduction technique that uses the linear prediction to remove noise components in speech signals has been proposed previously. The noise reduction works well for additive white noise signals, because the coefficients of the linear predictor converge such that the prediction error becomes white. In this method, the linear predictor is updated by a gradient-based algorithm with a fixed step-size. However, the optimal value of the step-size changes with the values of the prediction coefficients. In this paper, we propose a noise reduction system using the linear predictor with a variable step-size. The optimal value of the step-size depends also on the variance of the white noise, however the variance is unknown. We therefore introduce a speech/non-speech detector, and estimate the variance in non-speech segments where the observed signal includes only noise components. The simulation results show that the noise reduction capability of the proposed system is better than that of the conventional one with a fixed step-size.
Akira SOGAMI Arata KAWAMURA Youji IIGUNI
In this paper, we propose a distance-based howling canceller with high speech quality. We have developed a distance-based howling canceller that uses only distance information by noticing the property that howling occurs according to the distance between a loudspeaker and a microphone. This method estimates the distance by transmitting a pilot signal from the loudspeaker to the microphone. Multiple frequency candidates for each howling are computed from the estimated distance and eliminated by cascading notch filters that have nulls at them. However degradation of speech quality occurs at the howling canceller output. The first cause is a shot noise occurrence at the beginning and end of the pilot signal transmission due to the discontinuous change of the amplitude. We thus develop a new pilot signal that is robust against ambient noises. We can then reduce the shot noise effect by taking the amplitude small. The second one is a speech degradation caused from overlapped stopbands of the notch filters. We thus derive a condition on the bandwidths so that stopbands do not overlap, and propose an adaptive bandwidth scheme which changes the bandwidth according to the distance.
Makoto NAKASHIZUKA Yu ASHIHARA Youji IIGUNI
This paper proposes an adaptation method for structuring elements of morphological filters. A structuring element of a morphological filter specifies a shape of local structures that is eliminated or preserved in the output. The adaptation of the structuring element is hence a crucial problem for image denoising using morphological filters. Existing adaptation methods for structuring elements require preliminary training using example images. We propose an adaptation method for structuring elements of morphological opening filters that does not require such training. In our approach, the opening filter is interpreted as an approximation method with the union of the structuring elements. In order to eliminate noise components, a penalty defined from an assumption of image smoothness is imposed on the structuring element. Image denoising is achieved through decreasing the objective function, which is the sum of an approximation error term and the penalty function. In experiments, we use the proposed method to demonstrate positive impulsive noise reduction from images.
Yuya HOSODA Arata KAWAMURA Youji IIGUNI
The narrow bandwidth limitation of 300-3400Hz on the public switching telephone network results in speech quality deterioration. In this paper, we propose an artificial bandwidth extension approach that reconstructs the missing lower bandwidth of 50-300Hz using sinusoidal synthesis based on the first formant location. Sinusoidal synthesis generates sinusoidal waves with a harmonic structure. The proposed method detects the fundamental frequency using an autocorrelation method based on YIN algorithm, where a threshold processing avoids the false fundamental frequency detection on unvoiced sounds. The amplitude of the sinusoidal waves is calculated in the time domain from the weighted energy of 300-600Hz. In this case, since the first formant location corresponds to the first peak of the spectral envelope, we reconstruct the harmonic structure to avoid attenuating and overemphasizing by increasing the weight when the first formant location is lower, and vice versa. Consequently, the subjective and objective evaluations show that the proposed method reduces the speech quality difference between the original speech signal and the bandwidth extended speech signal.
Yosuke SUGIURA Arata KAWAMURA Youji IIGUNI
This paper proposes a comb filter design method which utilizes two linear phase FIR filters for flexibly adjusting the comb filter's frequency response. The first FIR filter is used to individually adjust the notch gains, which denote the local minimum gains of the comb filter's frequency response. The second FIR filter is used to design the elimination bandwidths for individual notch gains. We also derive an efficient comb filter by incorporating these two FIR filters with an all-pass filter which is used in a conventional comb filter to accurately align the nulls with the undesired harmonic frequencies. Several design examples of the derived comb filter show the effectiveness of the proposed comb filter design method.
Makoto NAKASHIZUKA Hiroyuki OKUMURA Youji IIGUNI
In this paper, we propose a method for supervised single-channel speech separation through sparse decomposition using periodic signal models. The proposed separation method employs sparse decomposition, which decomposes a signal into a set of periodic signals under a sparsity penalty. In order to achieve separation through sparse decomposition, the decomposed periodic signals have to be assigned to the corresponding sources. For the assignment of the periodic signal, we introduce clustering using a K-means algorithm to group the decomposed periodic signals into as many clusters as the number of speakers. After the clustering, each cluster is assigned to its corresponding speaker using preliminarily learnt codebooks. Through separation experiments, we compare our method with MaxVQ, which performs separation on the frequency spectrum domain. The experimental results in terms of signal-to-distortion ratio show that the proposed sparse decomposition method is comparable to the frequency domain approach and has less computational costs for assignment of speech components.
Arata KAWAMURA Hiro IGARASHI Youji IIGUNI
Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.
Yuta TSUKAMOTO Arata KAWAMURA Youji IIGUNI
In this paper, a novel speech enhancement algorithm based on the MAP estimation is proposed. The proposed speech enhancer adaptively changes the speech spectral density used in the MAP estimation according to the sum of the observed power spectra. In a speech segment, the speech spectral density approaches to Rayleigh distribution to keep the quality of the enhanced speech. While in a non-speech segment, it approaches to an exponential distribution to reduce noise effectively. Furthermore, when the noise is super-Gaussian, we modify the width of Gaussian so that the Gaussian model with the modified width approximates the distribution of the super-Gaussian noise. This technique is effective in suppressing residual noise well. From computer experiments, we confirm the effectiveness of the proposed method.
Yuya HOSODA Arata KAWAMURA Youji IIGUNI
In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.
Sayuri KOHMURA Arata KAWAMURA Youji IIGUNI
This paper proposes a noise reduction method for impact noise with damped oscillation caused by clinking a glass, hitting a bottle, and so on. The proposed method is based on the zero phase (ZP) signal defined as the IDFT of the spectral amplitude. When the target noise can be modeled as the sum of the impact part and the damped oscillation part, the proposed method can reduce them individually. First, the proposed method estimates the damped oscillation spectra and subtracts them from the observed spectra. Then, the impact part is reduced by replacing several samples of the ZP observed signal. Simulation results show that the proposed method improved 10dB of SNR of real impact noise.
Weerawut THANHIKAM Arata KAWAMURA Youji IIGUNI
In this paper, we propose a speech enhancement algorithm by using MAP estimation with variable speech spectral amplitude probability density function (speech PDF). The variable speech PDF has two adaptive shape parameters which affect the quality of enhanced speech. Noise can be efficiently suppressed when these parameters are properly applied so that the variable speech PDF shape fits to the real-speech PDF one. We derive adaptive shape parameters from real-speech PDF in various narrow SNR intervals. The proposed speech enhancement algorithm with adaptive shape parameters is examined and compared to conventional algorithms. The simulation results show that the proposed method improved SegSNR around 6 and 9 dB when the input speech signal was corrupted by white and tunnel noises at 0 dB, respectively.
Weerawut THANHIKAM Yuki KAMAMORI Arata KAWAMURA Youji IIGUNI
This paper proposes a wide-band noise reduction method using a zero phase (ZP) signal which is defined as the IDFT of a spectral amplitude. When a speech signal has periodicity in a short observation, the corresponding ZP signal becomes also periodic. On the other hand, when a noise spectral amplitude is approximately flat, its ZP signal takes nonzero values only around the origin. Hence, when a periodic speech signal is embedded in a flat spectral noise in an analysis frame, its ZP signal becomes a periodic signal except around the origin. In the proposed noise reduction method, we replace the ZP signal around the origin with the ZP signal in the second or latter period. Then, we get an estimated speech ZP signal. The major advantages of this method are that it can reduce not only stationary wide-band noises but also non-stationary wide-band noises and does not require a prior estimation of the noise spectral amplitude. Simulation results show that the proposed noise reduction method improves the SNR more than 5 dB for a tunnel noise and 13 dB for a clap noise in a low SNR environment.
Masayuki SHIMIZU Makoto NAKASHIZUKA Youji IIGUNI
In this paper, we propose an image enlargement method by using morphological operators. Our enlargement method is based on the nonlinear frequency extrapolation method (Greenspan et al., 2000) by using a Laplacian pyramid image representation. In this method, the sampling process of input images is modeled as the Laplacian pyramid. A high resolution image is obtained with the finer scale Laplacian that is extrapolated by a nonlinear operation from a low resolution Laplacian. In this paper, we propose a novel nonlinear operation for extrapolation of the finer scale Laplacian. Our nonlinear operation is realized by morphological operators and is capable of generating the finer scale Laplacian, the amplitude of which is proportional to contrasts of edges that appear in the low resolution image. In experiments, the enlargement results given by the proposed method are demonstrated. Compared with the Greenspan's method, the proposed method can recover sharp intensity transients of image edges with small artifacts.
Yuki SATOMI Arata KAWAMURA Youji IIGUNI
For an adaptive system identification filter with a stochastic input signal, a coefficient vector updated with an NLMS algorithm converges in the sense of ensemble average and the expected convergence vector has been revealed. When the input signal is periodic, the convergence of the adaptive filter coefficients has also been proved. However, its convergence vector has not been revealed. In this paper, we derive the convergence vector of adaptive filter coefficients updated with the NLMS algorithm in system identification for deterministic sinusoidal inputs. Firstly, we derive the convergence vector when a disturbance does not exist. We show that the derived convergence vector depends only on the initial vector and the sinusoidal frequencies, and it is independent of the step-size for adaptation, sinusoidal amplitudes, and phases. Next, we derive the expected convergence vector when the disturbance exists. Simulation results support the validity of the derived convergence vectors.
Arata KAWAMURA Youji IIGUNI Yoshio ITOH
A parallel notch filter (PNF) for eliminating a sinusoidal signal whose frequency and phase are unknown, has been proposed previously. The PNF achieves both fast convergence and high estimation accuracy when the step-size for adaptation is appropriately determined. However, there has been no discussion of how to determine the appropriate step-size. In this paper, we derive the convergence condition on the step-size, and propose an adaptive algorithm with variable step-size so that convergence of the PNF is automatically satisfied. Moreover, we present a new filtering structure of the PNF that increases the convergence speed while keeping the estimation accuracy. We also derive a variable step-size scheme for the new PNF to guarantee the convergence. Simulation results show the effectiveness of the proposed method.