IEICE global.ieice.org Site

Author Search Result

[Author] Biao WANG(9hit)

1-9hit

Highly-Accurate and Real-Time Speech Measurement for Laser Doppler Vibrometers
Yahui WANG Wenxi ZHANG Zhou WU Xinxin KONG Yongbiao WANG Hongxin ZHANG

PAPER-Speech and Hearing

Pubricized:
2022/06/08
Vol:
E105-D No:9
Page(s):
1568-1580
Laser Doppler Vibrometers (LDVs) enable the acquisition of remote speech signals by measuring small-scale vibrations around a target. They are now widely used in the fields of information acquisition and national security. However, in remote speech detection, the coherent measurement signal is subject to environmental noise, making detecting and reconstructing speech signals challenging. To improve the detection distance and speech quality, this paper proposes a highly accurate real-time speech measurement method that can reconstruct speech from noisy coherent signals. First, the I/Q demodulation and arctangent phase discrimination are used to extract the phase transformation caused by the acoustic vibration from coherent signals. Then, an innovative smoothness criterion and a novel phase difference-based dynamic bilateral compensation phase unwrapping algorithm are used to remove any ambiguity caused by the arctangent phase discrimination in the previous step. This important innovation results in the highly accurate detection of phase jumps. After this, a further innovation is used to enhance the reconstructed speech by applying an improved waveform-based linear prediction coding method, together with adaptive spectral subtraction. This removes any impulsive or background noise. The accuracy and performance of the proposed method were validated by conducting extensive simulations and comparisons with existing techniques. The results show that the proposed algorithm can significantly improve the measurement of speech and the quality of reconstructed speech signals. The viability of the method was further assessed by undertaking a physical experiment, where LDV equipment was used to measure speech at a distance of 310m in an outdoor environment. The intelligibility rate for the reconstructed speech exceeded 95%, confirming the effectiveness and superiority of the method for long-distance laser speech measurement.
Efficient Early Termination Criterion for ADMM Penalized LDPC Decoder
Biao WANG Xiaopeng JIAO Jianjun MU Zhongfei WANG

LETTER-Coding Theory

Vol:
E101-A No:3
Page(s):
623-626
By tracking the changing rate of hard decisions during every two consecutive iterations of the alternating direction method of multipliers (ADMM) penalized decoding, an efficient early termination (ET) criterion is proposed to improve the convergence rate of ADMM penalized decoder for low-density parity-check (LDPC) codes. Compared to the existing ET criterion for ADMM penalized decoding, the proposed method can reduce the average number of iterations significantly at low signal-to-noise ratios with negligible performance degradation.
Two-Stage Block-Based Whitened Principal Component Analysis with Application to Single Sample Face Recognition
Biao WANG Wenming YANG Weifeng LI Qingmin LIAO

PAPER-Image Recognition, Computer Vision

Vol:
E95-D No:3
Page(s):
853-860
In the task of face recognition, a challenging issue is the one sample problem, namely, there is only one training sample per person. Principal component analysis (PCA) seeks a low-dimensional representation that maximizes the global scatter of the training samples, and thus is suitable for one sample problem. However, standard PCA is sensitive to the outliers and emphasizes more on the relatively distant sample pairs, which implies that the close samples belonging to different classes tend to be merged together. In this paper, we propose two-stage block-based whitened PCA (TS-BWPCA) to address this problem. For a specific probe image, in the first stage, we seek the K-Nearest Neighbors (K-NNs) in the whitened PCA space and thus exclude most of samples which are distant to the probe. In the second stage, we maximize the “local” scatter by performing whitened PCA on the K nearest samples, which could explore the most discriminative information for similar classes. Moreover, block-based scheme is incorporated to address the small sample problem. This two-stage process is actually a coarse-to-fine scheme that can maximize both global and local scatter, and thus overcomes the aforementioned shortcomings of PCA. Experimental results on FERET face database show that our proposed algorithm is better than several representative approaches.
Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions
Longbiao WANG Kazue MINAMI Kazumasa YAMAMOTO Seiichi NAKAGAWA

PAPER-Speaker Recognition

Vol:
E93-D No:9
Page(s):
2397-2406
In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.
Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN
Longbiao WANG Seiichi NAKAGAWA Norihide KITAOKA

PAPER-ASR under Reverberant Conditions

Vol:
E91-D No:3
Page(s):
457-466
In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VT-PDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the short-term spectrum based PDCMN.
Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
Longbiao WANG Norihide KITAOKA Seiichi NAKAGAWA

PAPER-Speech and Hearing

Vol:
E94-D No:3
Page(s):
659-667
We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtraction was proposed to estimate the power spectrum of the clean speech using power spectra of the distorted speech and the unknown impulse responses. To estimate the power spectra of the impulse responses, a variable step-size unconstrained MCLMS (VSS-UMCLMS) algorithm for identifying the impulse responses in a time domain is extended to a frequency domain. To reduce the effect of the estimation error of the channel impulse response, we normalize the early reverberation by cepstral mean normalization (CMN) instead of spectral subtraction using the estimated impulse response. Furthermore, our proposed method is combined with conventional delay-and-sum beamforming. We conducted recognition experiments on a distorted speech signal simulated by convolving multi-channel impulse responses with clean speech. The proposed method achieved a relative error reduction rate of 22.4% in relation to conventional CMN. By combining the proposed method with beamforming, a relative error reduction rate of 24.5% in relation to the conventional CMN with beamforming was achieved using only an isolated word (with duration of about 0.6 s) to estimate the spectrum of the impulse response.
Fast Converging ADMM Penalized Decoding Method Based on Improved Penalty Function for LDPC Codes
Biao WANG

LETTER-Coding Theory

Pubricized:
2020/05/08
Vol:
E103-A No:11
Page(s):
1304-1307
For low-density parity-check (LDPC) codes, the penalized decoding method based on the alternating direction method of multipliers (ADMM) can improve the decoding performance at low signal-to-noise ratios and also has low decoding complexity. There are three effective methods that could increase the ADMM penalized decoding speed, which are reducing the number of Euclidean projections in ADMM penalized decoding, designing an effective penalty function and selecting an appropriate layered scheduling strategy for message transmission. In order to further increase the ADMM penalized decoding speed, through reducing the number of Euclidean projections and using the vertical layered scheduling strategy, this paper designs a fast converging ADMM penalized decoding method based on the improved penalty function. Simulation results show that the proposed method not only improves the decoding performance but also reduces the average number of iterations and the average decoding time.
Spatially Adaptive Logarithmic Total Variation Model for Varying Light Face Recognition
Biao WANG Weifeng LI Zhimin LI Qingmin LIAO

LETTER-Image Recognition, Computer Vision

Vol:
E96-D No:1
Page(s):
155-158
In this letter, we propose an extension to the classical logarithmic total variation (LTV) model for face recognition under variant illumination conditions. LTV treats all facial areas with the same regularization parameters, which inevitably results in the loss of useful facial details and is harmful for recognition tasks. To address this problem, we propose to assign the regularization parameters which balance the large-scale (illumination) and small-scale (reflectance) components in a spatially adaptive scheme. Face recognition experiments on both Extended Yale B and the large-scale FERET databases demonstrate the effectiveness of the proposed method.
Two-Sided LPC-Based Speckle Noise Removal for Laser Speech Detection Systems
Yahui WANG Wenxi ZHANG Xinxin KONG Yongbiao WANG Hongxin ZHANG

PAPER-Speech and Hearing

Pubricized:
2021/03/17
Vol:
E104-D No:6
Page(s):
850-862
Laser speech detection uses a non-contact Laser Doppler Vibrometry (LDV)-based acoustic sensor to obtain speech signals by precisely measuring voice-generated surface vibrations. Over long distances, however, the detected signal is very weak and full of speckle noise. To enhance the quality and intelligibility of the detected signal, we designed a two-sided Linear Prediction Coding (LPC)-based locator and interpolator to detect and replace speckle noise. We first studied the characteristics of speckle noise in detected signals and developed a binary-state statistical model for speckle noise generation. A two-sided LPC-based locator was then designed to locate the polluted samples, composed of an inverse decorrelator, nonlinear filter and threshold estimator. This greatly improves the detectability of speckle noise and avoids false/missed detection by improving the noise-to-signal-ratio (NSR). Finally, samples from both sides of the speckle noise were used to estimate the parameters of the interpolator and to code samples for replacing the polluted samples. Real-world speckle noise removal experiments and simulation-based comparative experiments were conducted and the results show that the proposed method is better able to locate speckle noise in laser detected speech and highly effective at replacing it.

Author Search Result

[Author] Biao WANG(9hit)

Highly-Accurate and Real-Time Speech Measurement for Laser Doppler Vibrometers

Efficient Early Termination Criterion for ADMM Penalized LDPC Decoder

Two-Stage Block-Based Whitened Principal Component Analysis with Application to Single Sample Face Recognition

Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

Distant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm

Fast Converging ADMM Penalized Decoding Method Based on Improved Penalty Function for LDPC Codes

Spatially Adaptive Logarithmic Total Variation Model for Varying Light Face Recognition

Two-Sided LPC-Based Speckle Noise Removal for Laser Speech Detection Systems

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles