1-6hit |
Arata KAWAMURA Noboru HAYASAKA Naoto SASAOKA
We propose an impact and high-pitch noise-suppression method based on spectral entropy. Spectral entropy takes a large value for flat spectral amplitude and a small value for spectra with several lines. We model the impact noise as a flat spectral signal and its damped oscillation as a high-pitch periodic signal consisting of spectra with several lines. We discriminate between the current noise situations by using spectral entropy and adaptively change the noise-suppression parameters used in a zero phase-based impact-noise-suppression method. Simulation results show that the proposed method can improve the perceptual evaluation of the speech quality and speech-recognition rate compared to conventional methods.
Masakazu IWAI Takuya FUTAGAMI Noboru HAYASAKA Takao ONOYE
In this paper, we improve upon the automatic building extraction method, which uses a variational inference Gaussian mixture model for performing color clustering, by accelerating its computational speed. The improved method decreases the computational time using an image with reduced resolution upon applying color clustering. According to our experiment, in which we used 106 scenery images, the improved method could extract buildings at a rate 86.54% faster than that of the conventional methods. Furthermore, the improved method significantly increased the extraction accuracy by 1.8% or more by preventing over-clustering using the reduced image, which also had a reduced number of the colors.
Shingo YOSHIZAWA Noboru HAYASAKA Naoya WADA Yoshikazu MIYANAGA
This paper describes a noise robustness technique that normalizes the cepstral amplitude range in order to remove the influence of additive noise. Additive noise causes speech feature mismatches between testing and training environments and it degrades recognition accuracy in noisy environments. We presume an approximate model that expresses the influence by changing the amplitude range and the DC component in the log-spectra. According to this model, we propose a cepstral amplitude range normalization (CARN) that normalizes the cepstral distance between maximum and minimum values. It can estimate noise robust features without prior knowledge or adaptation. We evaluated its performance in an isolated word recognition task by using the Noisex92 database. Compared with the combinations of conventional methods, the CARN could improve recognition accuracy under various SNR conditions.
Noboru HAYASAKA Riku KASAI Takuya FUTAGAMI
In this paper, we propose a noise-robust scream detection method with the aim of expanding the scream detection system, a sound-based security system. The proposed method uses enhanced screams using Wave-U-Net, which was effective as a noise reduction method for noisy screams. However, the enhanced screams showed different frequency components from clean screams and erroneously emphasized frequency components similar to scream in noise. Therefore, Wave-U-Net was applied even in the process of training Gaussian mixture models, which are discriminators. We conducted detection experiments using the proposed method in various noise environments and determined that the false acceptance rate was reduced by an average of 2.1% or more compared with the conventional method.
Xin XU Noboru HAYASAKA Yoshikazu MIYANAGA
This paper proposes a new algorithm named Adaptive Running Spectrum Filtering (ARSF) to restore the amplitude spectra of speech corrupted by additive noises. Based on the pre-hand noise estimation, adaptive filtering is used in speech modulation spectra according to the noise conditions. The periodic structures in the amplitude spectra are kept against noise distortion. Since the amplitude spectral structures contain the information of fundamental frequency, which is the inverse of pitch period, ARSF algorithm is added into robust pitch detection to increase the accuracy. Compared with the conventional methods, experimental results show that the proposed method significantly improves the robustness of pitch detection against noise conditions with several types and SNRs.
Naoto SASAOKA Eiji AKAMATSU Arata KAWAMURA Noboru HAYASAKA Yoshio ITOH
Speech enhancement has been proposed to reduce the impulsive noise whose frequency characteristic is wideband. On the other hand, it is challenging to reduce the ringing sound, which is narrowband in impulsive noise. Therefore, we propose the modeling of the ringing sound and its estimation by a linear predictor (LP). However, it is difficult to estimate the ringing sound only in noisy speech due to the auto-correlation property of speech. The proposed system adopts the 4th order moment-based adaptive algorithm by noticing the difference between the 4th order statistics of speech and impulsive noise. The brief analysis and simulation results show that the proposed system has the potential to reduce ringing sound while keeping the quality of enhanced speech.