The search functionality is under construction.
The search functionality is under construction.

A Speech Intelligibility Estimation Method Using a Non-reference Feature Set

Toshihiro SAKANO, Yosuke KOBAYASHI, Kazuhiro KONDO

  • Full Text Views

    0

  • Cite this

Summary :

We proposed and evaluated a speech intelligibility estimation method that does not require a clean speech reference signal. The propose method uses the features defined in the ITU-T standard P.563, which estimates the overall quality of speech without the reference signal. We selected two sets of features from the P.563 features; the basic 9-feature set, which includes basic features that characterize both speech and background noise, e.g., cepstrum skewness and LPC kurtosis, and the extended 31-feature set with 22 additional features for a more accurate description of the degraded speech and noise, e.g., SNR, average pitch, and spectral clarity among others. Four hundred noise samples were added to speech, and about 70% of these samples were used to train a support vector regression (SVR) model. The trained models were used to estimate the intelligibility of speech degraded by added noise. The proposed method showed a root mean square error (RMSE) value of about 10% and correlation with subjective intelligibility of about 0.93 for speech distorted with known noise type, and RMSE of about 16% and a correlation of about 0.84 for speech distorted with unknown noise type, both with either the 9 or the 31-dimension feature set. These results were higher than the estimation using frequency-weighed SNR calculated in critical frequency bands, which requires the clean reference signal for its calculation. We believe this level of accuracy proves the proposed method to be applicable to real-time speech quality monitoring in the field.

Publication
IEICE TRANSACTIONS on Information Vol.E98-D No.1 pp.21-28
Publication Date
2015/01/01
Publicized
Online ISSN
1745-1361
DOI
10.1587/transinf.2014MUP0004
Type of Manuscript
Special Section PAPER (Special Section on Enriched Multimedia)
Category

Authors

Toshihiro SAKANO
  Yamagata Unversity
Yosuke KOBAYASHI
  Yamagata Unversity
Kazuhiro KONDO
  Yamagata Unversity

Keyword