The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kazuo ONOE, Shoei SATO, Shinichi HOMMA, Akio KOBAYASHI, Toru IMAI, Tohru TAKAGI, "Bi-Spectral Acoustic Features for Robust Speech Recognition" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 3, pp. 631-634, March 2008, doi: 10.1093/ietisy/e91-d.3.631.
Abstract: The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.3.631/_p
Copy
@ARTICLE{e91-d_3_631,
author={Kazuo ONOE, Shoei SATO, Shinichi HOMMA, Akio KOBAYASHI, Toru IMAI, Tohru TAKAGI, },
journal={IEICE TRANSACTIONS on Information},
title={Bi-Spectral Acoustic Features for Robust Speech Recognition},
year={2008},
volume={E91-D},
number={3},
pages={631-634},
abstract={The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.},
keywords={},
doi={10.1093/ietisy/e91-d.3.631},
ISSN={1745-1361},
month={March},}
Copy
TY - JOUR
TI - Bi-Spectral Acoustic Features for Robust Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 631
EP - 634
AU - Kazuo ONOE
AU - Shoei SATO
AU - Shinichi HOMMA
AU - Akio KOBAYASHI
AU - Toru IMAI
AU - Tohru TAKAGI
PY - 2008
DO - 10.1093/ietisy/e91-d.3.631
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2008
AB - The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.
ER -