IEICE global.ieice.org Site

Author Search Result

[Author] YongZhu HUA(2hit)

1-2hit

Pitch Estimation and Voicing Classification Using Reconstructed Spectrum from MFCC
JianFeng WU HuiBin QIN YongZhu HUA LingYan FAN

LETTER-Speech and Hearing

Pubricized:
2017/11/15
Vol:
E101-D No:2
Page(s):
556-559
In this paper, a novel method for pitch estimation and voicing classification is proposed using reconstructed spectrum from Mel-frequency cepstral coefficients (MFCC). The proposed algorithm reconstructs spectrum from MFCC with Moore-Penrose pseudo-inverse by Mel-scale weighting functions. The reconstructed spectrum is compressed and filtered in log-frequency. Pitch estimation is achieved by modeling the joint density of pitch frequency and the filter spectrum with Gaussian Mixture Model (GMM). Voicing classification is also achieved by GMM-based model, and the test results show that over 99% frames can be correctly classified. The results of pitch estimation demonstrate that the proposed GMM-based pitch estimator has high accuracy, and the relative error is 6.68% on TIMIT database.
Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network
JianFeng WU HuiBin QIN YongZhu HUA LiHuan SHAO Ji HU ShengYing YANG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/07/02
Vol:
E102-D No:10
Page(s):
2047-2050
This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.

Author Search Result

[Author] YongZhu HUA(2hit)

Pitch Estimation and Voicing Classification Using Reconstructed Spectrum from MFCC

Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles