Speech Analysis Based on Modeling the Effective Voice Source

M. Shahidur RAHMAN; Tetsuya SHIMAMURA

doi:10.1093/ietisy/e89-d.3.1107

Speech Analysis Based on Modeling the Effective Voice Source

M. Shahidur RAHMAN, Tetsuya SHIMAMURA

Full Text Views

0

Cite this

Summary :

A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of voiced speech. This harmonic characteristic is coupled with the estimation of autoregressive (AR) coefficients that results in difficulties in estimating the vocal tract filter. This paper models the effective voice source from the residual obtained through the covariance analysis in the first-pass which is then used as input to the second-pass least-square analysis. A better source-filter separation is thus achieved. The formant frequencies and corresponding bandwidths obtained using the proposed method for synthetic vowels are found to be accurate up to a factor of more than three (in percent) compared to the conventional method. Since the source characteristic is taken into account, local variations due to the positioning of analysis window are reduced significantly. The validity of the proposed method is also examined by inspecting the spectra obtained from natural vowel sounds uttered by high-pitched female speaker.

Publication: IEICE TRANSACTIONS on Information Vol.E89-D No.3 pp.1107-1115

Publication Date: 2006/03/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e89-d.3.1107

Type of Manuscript: Special Section PAPER (Special Section on Statistical Modeling for Speech Processing)

Category: Speech Analysis

Cite this

Copy

M. Shahidur RAHMAN, Tetsuya SHIMAMURA, "Speech Analysis Based on Modeling the Effective Voice Source" in IEICE TRANSACTIONS on Information, vol. E89-D, no. 3, pp. 1107-1115, March 2006, doi: 10.1093/ietisy/e89-d.3.1107.
Abstract: A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of voiced speech. This harmonic characteristic is coupled with the estimation of autoregressive (AR) coefficients that results in difficulties in estimating the vocal tract filter. This paper models the effective voice source from the residual obtained through the covariance analysis in the first-pass which is then used as input to the second-pass least-square analysis. A better source-filter separation is thus achieved. The formant frequencies and corresponding bandwidths obtained using the proposed method for synthetic vowels are found to be accurate up to a factor of more than three (in percent) compared to the conventional method. Since the source characteristic is taken into account, local variations due to the positioning of analysis window are reduced significantly. The validity of the proposed method is also examined by inspecting the spectra obtained from natural vowel sounds uttered by high-pitched female speaker.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.3.1107/_p

Copy

@ARTICLE{e89-d_3_1107,
author={M. Shahidur RAHMAN, Tetsuya SHIMAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Speech Analysis Based on Modeling the Effective Voice Source},
year={2006},
volume={E89-D},
number={3},
pages={1107-1115},
abstract={A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of voiced speech. This harmonic characteristic is coupled with the estimation of autoregressive (AR) coefficients that results in difficulties in estimating the vocal tract filter. This paper models the effective voice source from the residual obtained through the covariance analysis in the first-pass which is then used as input to the second-pass least-square analysis. A better source-filter separation is thus achieved. The formant frequencies and corresponding bandwidths obtained using the proposed method for synthetic vowels are found to be accurate up to a factor of more than three (in percent) compared to the conventional method. Since the source characteristic is taken into account, local variations due to the positioning of analysis window are reduced significantly. The validity of the proposed method is also examined by inspecting the spectra obtained from natural vowel sounds uttered by high-pitched female speaker.},
keywords={},
doi={10.1093/ietisy/e89-d.3.1107},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Speech Analysis Based on Modeling the Effective Voice Source
T2 - IEICE TRANSACTIONS on Information
SP - 1107
EP - 1115
AU - M. Shahidur RAHMAN
AU - Tetsuya SHIMAMURA
PY - 2006
DO - 10.1093/ietisy/e89-d.3.1107
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2006
AB - A new system identification based method has been proposed for accurate estimation of vocal tract parameters. An often encountered problem in using the conventional linear prediction analysis is due to the harmonic structure of the excitation source of voiced speech. This harmonic characteristic is coupled with the estimation of autoregressive (AR) coefficients that results in difficulties in estimating the vocal tract filter. This paper models the effective voice source from the residual obtained through the covariance analysis in the first-pass which is then used as input to the second-pass least-square analysis. A better source-filter separation is thus achieved. The formant frequencies and corresponding bandwidths obtained using the proposed method for synthetic vowels are found to be accurate up to a factor of more than three (in percent) compared to the conventional method. Since the source characteristic is taken into account, local variations due to the positioning of analysis window are reduced significantly. The validity of the proposed method is also examined by inspecting the spectra obtained from natural vowel sounds uttered by high-pitched female speaker.
ER -