Bi-Spectral Acoustic Features for Robust Speech Recognition

Kazuo ONOE; Shoei SATO; Shinichi HOMMA; Akio KOBAYASHI; Toru IMAI; Tohru TAKAGI

doi:10.1093/ietisy/e91-d.3.631

IEICE TRANSACTIONS on Information

Bi-Spectral Acoustic Features for Robust Speech Recognition

Kazuo ONOE, Shoei SATO, Shinichi HOMMA, Akio KOBAYASHI, Toru IMAI, Tohru TAKAGI

Full Text Views

0

Cite this

Summary :

The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.

Publication: IEICE TRANSACTIONS on Information Vol.E91-D No.3 pp.631-634

Publication Date: 2008/03/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e91-d.3.631

Type of Manuscript: Special Section LETTER (Special Section on Robust Speech Processing in Realistic Environments)

Category

Cite this

Copy

Kazuo ONOE, Shoei SATO, Shinichi HOMMA, Akio KOBAYASHI, Toru IMAI, Tohru TAKAGI, "Bi-Spectral Acoustic Features for Robust Speech Recognition" in IEICE TRANSACTIONS on Information, vol. E91-D, no. 3, pp. 631-634, March 2008, doi: 10.1093/ietisy/e91-d.3.631.
Abstract: The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.3.631/_p

Copy

@ARTICLE{e91-d_3_631,
author={Kazuo ONOE, Shoei SATO, Shinichi HOMMA, Akio KOBAYASHI, Toru IMAI, Tohru TAKAGI, },
journal={IEICE TRANSACTIONS on Information},
title={Bi-Spectral Acoustic Features for Robust Speech Recognition},
year={2008},
volume={E91-D},
number={3},
pages={631-634},
abstract={The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.},
keywords={},
doi={10.1093/ietisy/e91-d.3.631},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Bi-Spectral Acoustic Features for Robust Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 631
EP - 634
AU - Kazuo ONOE
AU - Shoei SATO
AU - Shinichi HOMMA
AU - Akio KOBAYASHI
AU - Toru IMAI
AU - Tohru TAKAGI
PY - 2008
DO - 10.1093/ietisy/e91-d.3.631
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2008
AB - The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bi-spectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.
ER -

IEICE TRANSACTIONS on Information

Bi-Spectral Acoustic Features for Robust Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Bi-Spectral Acoustic Features for Robust Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles