The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Reiko TAKOU, Hiroyuki SEGI, Tohru TAKAGI, Nobumasa SEIYAMA, "Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech" in IEICE TRANSACTIONS on Fundamentals,
vol. E95-A, no. 4, pp. 751-759, April 2012, doi: 10.1587/transfun.E95.A.751.
Abstract: The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E95.A.751/_p
Copy
@ARTICLE{e95-a_4_751,
author={Reiko TAKOU, Hiroyuki SEGI, Tohru TAKAGI, Nobumasa SEIYAMA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech},
year={2012},
volume={E95-A},
number={4},
pages={751-759},
abstract={The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.},
keywords={},
doi={10.1587/transfun.E95.A.751},
ISSN={1745-1337},
month={April},}
Copy
TY - JOUR
TI - Spectral Features for Perceptually Natural Phoneme Replacement by Another Speaker's Speech
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 751
EP - 759
AU - Reiko TAKOU
AU - Hiroyuki SEGI
AU - Tohru TAKAGI
AU - Nobumasa SEIYAMA
PY - 2012
DO - 10.1587/transfun.E95.A.751
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E95-A
IS - 4
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - April 2012
AB - The frequency regions and spectral features that can be used to measure the perceived similarity and continuity of voice quality are reported here. A perceptual evaluation test was conducted to assess the naturalness of spoken sentences in which either a vowel or a long vowel of the original speaker was replaced by that of another. Correlation analysis between the evaluation score and the spectral feature distance was conducted to select the spectral features that were expected to be effective in measuring the voice quality and to identify the appropriate speech segment of another speaker. The mel-frequency cepstrum coefficient (MFCC) and the spectral center of gravity (COG) in the low-, middle-, and high-frequency regions were selected. A perceptual paired comparison test was carried out to confirm the effectiveness of the spectral features. The results showed that the MFCC was effective for spectra across a wide range of frequency regions, the COG was effective in the low- and high-frequency regions, and the effective spectral features differed among the original speakers.
ER -