High Quality Speech Synthesis Based on the Reproduction of the Randomness in Speech Signals

Naofumi AOKI

High Quality Speech Synthesis Based on the Reproduction of the Randomness in Speech Signals

Naofumi AOKI

Full Text Views

0

Cite this

Summary :

A high quality speech synthesis technique based on the wavelet subband analysis of speech signals was newly devised for enhancing the naturalness of synthesized voiced consonant speech. The technique reproduces a speech characteristic of voiced consonant speech that shows unvoiced feature remarkably in the high frequency subbands. For mixing appropriately the unvoiced feature into voiced speech, a noise inclusion procedure that employed the discrete wavelet transform was proposed. This paper also describes a developed speech synthesizer that employs several random fractal techniques. These techniques were employed for enhancing especially the naturalness of synthesized purely voiced speech. Three types of fluctuations, (1) pitch period fluctuation, (2) amplitude fluctuation, and (3) waveform fluctuation were treated in the speech synthesizer. In addition, instead of a normal impulse train, a triangular pulse was used as a simple model for the glottal excitation pulse. For the compensation for the degraded frequency characteristic of the triangular pulse that overdecreases than the spectral -6 dB/oct characteristic required for the glottal excitation pulse, the random fractal interpolation technique was applied. In order to evaluate the developed speech synthesis system, psychoacoustic experiments were carried out. The experiments especially focused on how the mixed excitation scheme effectively contributed to enhancing the naturalness of voiced consonant speech. In spite that the proposed techniques were just a little modification for enhancing the conventional LPC (linear predictive coding) speech synthesizer, the subjective evaluation suggested that the system could effectively gain the naturalness of the synthesized speech that tended to degrade in the conventional LPC speech synthesis scheme.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E84-A No.9 pp.2198-2206

Publication Date: 2001/09/01

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Section on Nonlinear Theory and its Applications)

Category: Image & Signal Processing

Cite this

Copy

Naofumi AOKI, "High Quality Speech Synthesis Based on the Reproduction of the Randomness in Speech Signals" in IEICE TRANSACTIONS on Fundamentals, vol. E84-A, no. 9, pp. 2198-2206, September 2001, doi: .
Abstract: A high quality speech synthesis technique based on the wavelet subband analysis of speech signals was newly devised for enhancing the naturalness of synthesized voiced consonant speech. The technique reproduces a speech characteristic of voiced consonant speech that shows unvoiced feature remarkably in the high frequency subbands. For mixing appropriately the unvoiced feature into voiced speech, a noise inclusion procedure that employed the discrete wavelet transform was proposed. This paper also describes a developed speech synthesizer that employs several random fractal techniques. These techniques were employed for enhancing especially the naturalness of synthesized purely voiced speech. Three types of fluctuations, (1) pitch period fluctuation, (2) amplitude fluctuation, and (3) waveform fluctuation were treated in the speech synthesizer. In addition, instead of a normal impulse train, a triangular pulse was used as a simple model for the glottal excitation pulse. For the compensation for the degraded frequency characteristic of the triangular pulse that overdecreases than the spectral -6 dB/oct characteristic required for the glottal excitation pulse, the random fractal interpolation technique was applied. In order to evaluate the developed speech synthesis system, psychoacoustic experiments were carried out. The experiments especially focused on how the mixed excitation scheme effectively contributed to enhancing the naturalness of voiced consonant speech. In spite that the proposed techniques were just a little modification for enhancing the conventional LPC (linear predictive coding) speech synthesizer, the subjective evaluation suggested that the system could effectively gain the naturalness of the synthesized speech that tended to degrade in the conventional LPC speech synthesis scheme.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e84-a_9_2198/_p

Copy

@ARTICLE{e84-a_9_2198,
author={Naofumi AOKI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={High Quality Speech Synthesis Based on the Reproduction of the Randomness in Speech Signals},
year={2001},
volume={E84-A},
number={9},
pages={2198-2206},
abstract={A high quality speech synthesis technique based on the wavelet subband analysis of speech signals was newly devised for enhancing the naturalness of synthesized voiced consonant speech. The technique reproduces a speech characteristic of voiced consonant speech that shows unvoiced feature remarkably in the high frequency subbands. For mixing appropriately the unvoiced feature into voiced speech, a noise inclusion procedure that employed the discrete wavelet transform was proposed. This paper also describes a developed speech synthesizer that employs several random fractal techniques. These techniques were employed for enhancing especially the naturalness of synthesized purely voiced speech. Three types of fluctuations, (1) pitch period fluctuation, (2) amplitude fluctuation, and (3) waveform fluctuation were treated in the speech synthesizer. In addition, instead of a normal impulse train, a triangular pulse was used as a simple model for the glottal excitation pulse. For the compensation for the degraded frequency characteristic of the triangular pulse that overdecreases than the spectral -6 dB/oct characteristic required for the glottal excitation pulse, the random fractal interpolation technique was applied. In order to evaluate the developed speech synthesis system, psychoacoustic experiments were carried out. The experiments especially focused on how the mixed excitation scheme effectively contributed to enhancing the naturalness of voiced consonant speech. In spite that the proposed techniques were just a little modification for enhancing the conventional LPC (linear predictive coding) speech synthesizer, the subjective evaluation suggested that the system could effectively gain the naturalness of the synthesized speech that tended to degrade in the conventional LPC speech synthesis scheme.},
keywords={},
doi={},
ISSN={},
month={September},}

Copy

TY - JOUR
TI - High Quality Speech Synthesis Based on the Reproduction of the Randomness in Speech Signals
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2198
EP - 2206
AU - Naofumi AOKI
PY - 2001
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E84-A
IS - 9
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - September 2001
AB - A high quality speech synthesis technique based on the wavelet subband analysis of speech signals was newly devised for enhancing the naturalness of synthesized voiced consonant speech. The technique reproduces a speech characteristic of voiced consonant speech that shows unvoiced feature remarkably in the high frequency subbands. For mixing appropriately the unvoiced feature into voiced speech, a noise inclusion procedure that employed the discrete wavelet transform was proposed. This paper also describes a developed speech synthesizer that employs several random fractal techniques. These techniques were employed for enhancing especially the naturalness of synthesized purely voiced speech. Three types of fluctuations, (1) pitch period fluctuation, (2) amplitude fluctuation, and (3) waveform fluctuation were treated in the speech synthesizer. In addition, instead of a normal impulse train, a triangular pulse was used as a simple model for the glottal excitation pulse. For the compensation for the degraded frequency characteristic of the triangular pulse that overdecreases than the spectral -6 dB/oct characteristic required for the glottal excitation pulse, the random fractal interpolation technique was applied. In order to evaluate the developed speech synthesis system, psychoacoustic experiments were carried out. The experiments especially focused on how the mixed excitation scheme effectively contributed to enhancing the naturalness of voiced consonant speech. In spite that the proposed techniques were just a little modification for enhancing the conventional LPC (linear predictive coding) speech synthesizer, the subjective evaluation suggested that the system could effectively gain the naturalness of the synthesized speech that tended to degrade in the conventional LPC speech synthesis scheme.
ER -