1-3hit |
Ryouichi NISHIMURA Seigo ENOMOTO Hiroaki KATO
Surveillance with multiple cameras and microphones is promising to trace activities of suspicious persons for security purposes. When these sensors are connected to the Internet, they might also jeopardize innocent people's privacy because, as a result of human error, signals from sensors might allow eavesdropping by malicious persons. This paper presents a proposal for exploiting super-resolution to address this problem. Super-resolution is a signal processing technique by which a high-resolution version of a signal can be reproduced from a low-resolution version of the same signal source. Because of this property, an intelligible speech signal is reconstructed from multiple sensor signals, each of which is completely unintelligible because of its sufficiently low sampling rate. A method based on Bayesian linear regression is proposed in comparison with one based on maximum likelihood. Computer simulations using a simple sinusoidal input demonstrate that the methods restore the original signal from those which are actually measured. Moreover, results show that the method based on Bayesian linear regression is more robust than maximum likelihood under various microphone configurations in noisy environments and that this advantage is remarkable when the number of microphones enrolled in the process is as small as the minimum required. Finally, listening tests using speech signals confirmed that mean opinion score (MOS) of the reconstructed signal reach 3, while those of the original signal captured at each single microphone are almost 1.
Weifeng LI Katsunobu ITOU Kazuya TAKEDA Fumitada ITAKURA
We address issues for improving hands-free speech enhancement and speech recognition performance in different car environments using a single distant microphone. This paper describes a new single-channel in-car speech enhancement method that estimates the log spectra of speech at a close-talking microphone based on the nonlinear regression of the log spectra of noisy signal captured by a distant microphone and the estimated noise. The proposed method provides significant overall quality improvements in our subjective evaluation on the regression-enhanced speech, and performed best in most objective measures. Based on our isolated word recognition experiments conducted under 15 real car environments, the proposed adaptive nonlinear regression approach shows an advantage in average relative word error rate (WER) reductions of 50.8% and 13.1%, respectively, compared to original noisy speech and ETSI advanced front-end (ETSI ES 202 050).
This paper presents the subjective speech quality evaluation in terms of the Mean Opinion Score (MOS) and the relationship between BER and subjective speech quality in a GSM-based radio system. The results show that in certain environments (hilly terrain and rural areas), a SNR (or C/I) higher than 12 dB is required for acceptable speech quality. For an acceptable speech quality, a BER(c1) better than 10-2 is needed in a GSM-based system.