We propose a speech analysis method based on the source-filter model using multivariate empirical mode decomposition (MEMD). The proposed method takes multiple adjacent frames of a speech signal into account by combining their log spectra into multivariate signals. The multivariate signals are then decomposed into intrinsic mode functions (IMFs). The IMFs are divided into two groups using the peak of the autocorrelation function (ACF) of an IMF. The first group characterized by a spectral fine structure is used to estimate the fundamental frequency F0 by using the ACF, whereas the second group characterized by the frequency response of the vocal-tract filter is used to estimate formant frequencies by using a peak picking technique. There are two advantages of using MEMD: (i) the variation in the number of IMFs is eliminated in contrast with single-frame based empirical mode decomposition and (ii) the common information of the adjacent frames aligns in the same order of IMFs because of the common mode alignment property of MEMD. These advantages make the analysis more accurate than with other methods. As opposed to the conventional linear prediction (LP) and cepstrum methods, which rely on the LP order and cut-off frequency, respectively, the proposed method automatically separates the glottal-source and vocal-tract filter. The results showed that the proposed method exhibits the highest accuracy of F0 estimation and correctly estimates the formant frequencies of the vocal-tract filter.
Surasak BOONKLA
Japan Advanced Institute of Science and Technology,Thammasat University
Masashi UNOKI
Japan Advanced Institute of Science and Technology
Stanislav S. MAKHANOV
Thammasat University
Chai WUTIWIWATCHAI
National Electronics and Computer Technology Center
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Surasak BOONKLA, Masashi UNOKI, Stanislav S. MAKHANOV, Chai WUTIWIWATCHAI, "Speech Analysis Method Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition" in IEICE TRANSACTIONS on Fundamentals,
vol. E99-A, no. 10, pp. 1762-1773, October 2016, doi: 10.1587/transfun.E99.A.1762.
Abstract: We propose a speech analysis method based on the source-filter model using multivariate empirical mode decomposition (MEMD). The proposed method takes multiple adjacent frames of a speech signal into account by combining their log spectra into multivariate signals. The multivariate signals are then decomposed into intrinsic mode functions (IMFs). The IMFs are divided into two groups using the peak of the autocorrelation function (ACF) of an IMF. The first group characterized by a spectral fine structure is used to estimate the fundamental frequency F0 by using the ACF, whereas the second group characterized by the frequency response of the vocal-tract filter is used to estimate formant frequencies by using a peak picking technique. There are two advantages of using MEMD: (i) the variation in the number of IMFs is eliminated in contrast with single-frame based empirical mode decomposition and (ii) the common information of the adjacent frames aligns in the same order of IMFs because of the common mode alignment property of MEMD. These advantages make the analysis more accurate than with other methods. As opposed to the conventional linear prediction (LP) and cepstrum methods, which rely on the LP order and cut-off frequency, respectively, the proposed method automatically separates the glottal-source and vocal-tract filter. The results showed that the proposed method exhibits the highest accuracy of F0 estimation and correctly estimates the formant frequencies of the vocal-tract filter.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E99.A.1762/_p
Copy
@ARTICLE{e99-a_10_1762,
author={Surasak BOONKLA, Masashi UNOKI, Stanislav S. MAKHANOV, Chai WUTIWIWATCHAI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Speech Analysis Method Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition},
year={2016},
volume={E99-A},
number={10},
pages={1762-1773},
abstract={We propose a speech analysis method based on the source-filter model using multivariate empirical mode decomposition (MEMD). The proposed method takes multiple adjacent frames of a speech signal into account by combining their log spectra into multivariate signals. The multivariate signals are then decomposed into intrinsic mode functions (IMFs). The IMFs are divided into two groups using the peak of the autocorrelation function (ACF) of an IMF. The first group characterized by a spectral fine structure is used to estimate the fundamental frequency F0 by using the ACF, whereas the second group characterized by the frequency response of the vocal-tract filter is used to estimate formant frequencies by using a peak picking technique. There are two advantages of using MEMD: (i) the variation in the number of IMFs is eliminated in contrast with single-frame based empirical mode decomposition and (ii) the common information of the adjacent frames aligns in the same order of IMFs because of the common mode alignment property of MEMD. These advantages make the analysis more accurate than with other methods. As opposed to the conventional linear prediction (LP) and cepstrum methods, which rely on the LP order and cut-off frequency, respectively, the proposed method automatically separates the glottal-source and vocal-tract filter. The results showed that the proposed method exhibits the highest accuracy of F0 estimation and correctly estimates the formant frequencies of the vocal-tract filter.},
keywords={},
doi={10.1587/transfun.E99.A.1762},
ISSN={1745-1337},
month={October},}
Copy
TY - JOUR
TI - Speech Analysis Method Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1762
EP - 1773
AU - Surasak BOONKLA
AU - Masashi UNOKI
AU - Stanislav S. MAKHANOV
AU - Chai WUTIWIWATCHAI
PY - 2016
DO - 10.1587/transfun.E99.A.1762
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E99-A
IS - 10
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - October 2016
AB - We propose a speech analysis method based on the source-filter model using multivariate empirical mode decomposition (MEMD). The proposed method takes multiple adjacent frames of a speech signal into account by combining their log spectra into multivariate signals. The multivariate signals are then decomposed into intrinsic mode functions (IMFs). The IMFs are divided into two groups using the peak of the autocorrelation function (ACF) of an IMF. The first group characterized by a spectral fine structure is used to estimate the fundamental frequency F0 by using the ACF, whereas the second group characterized by the frequency response of the vocal-tract filter is used to estimate formant frequencies by using a peak picking technique. There are two advantages of using MEMD: (i) the variation in the number of IMFs is eliminated in contrast with single-frame based empirical mode decomposition and (ii) the common information of the adjacent frames aligns in the same order of IMFs because of the common mode alignment property of MEMD. These advantages make the analysis more accurate than with other methods. As opposed to the conventional linear prediction (LP) and cepstrum methods, which rely on the LP order and cut-off frequency, respectively, the proposed method automatically separates the glottal-source and vocal-tract filter. The results showed that the proposed method exhibits the highest accuracy of F0 estimation and correctly estimates the formant frequencies of the vocal-tract filter.
ER -