An Efficient Mandarin Text-to-Speech System on Time Domain

Yih-Jeng LIN; Ming-Shing YU

An Efficient Mandarin Text-to-Speech System on Time Domain

Yih-Jeng LIN, Ming-Shing YU

Full Text Views

0

Cite this

Summary :

This paper describes a complete Mandarin text-to-speech system on time domain. We take advantage of the advancement of memory technology, which achieves ever-increasing capacity and ever-lower price. We try to collect as more as possible the synthesis units in a Mandarin text-to-speech system. With such an effort, we developed simpler speech processing techniques and achieved faster processing speed by using only an ordinary personal computer. We also developed delicate methods to measure the intelligibility, comprehensibility, and naturalness of a Mandarin text-to-speech system. Our system performs very well compared with existing systems. We first develop a set of useful algorithms and methods to deal with some features of the syllables, such as duration, amplitude, fundamental frequency, pause, and so on. Based on these algorithms and methods, we then build a Mandarin text-to-speech system. Given any Chinese text in some computerized form, e. g. , in BIG-5 code representation, our system can pronounce the text in real time. Our text-to-speech system runs on an IBM 80486 compatible PC, with no special hardware for signal processing. The evaluation of our text-to-speech system is based on a proposed subjective evaluation method. An evaluation was made by 51 undergraduate students. The intelligibility of our text-to-speech system is 99. 5%, the comprehensibility of our text-to-speech system is 92. 6%, and the naturalness of our text-to-speech system is 81. 512 points in a percentile grading system (the highest score is 100 points, and the lowest score is 0 point). Other 40 Ph. D. students also did the same evaluation about naturalness. The result shows that the naturalness of our text-to-speech system is 82. 8 points in a percentile grading system.

Publication: IEICE TRANSACTIONS on Information Vol.E81-D No.6 pp.545-555

Publication Date: 1998/06/25

Publicized

Online ISSN

DOI

Type of Manuscript

Category: Speech Processing and Acoustics

Cite this

Copy

Yih-Jeng LIN, Ming-Shing YU, "An Efficient Mandarin Text-to-Speech System on Time Domain" in IEICE TRANSACTIONS on Information, vol. E81-D, no. 6, pp. 545-555, June 1998, doi: .
Abstract: This paper describes a complete Mandarin text-to-speech system on time domain. We take advantage of the advancement of memory technology, which achieves ever-increasing capacity and ever-lower price. We try to collect as more as possible the synthesis units in a Mandarin text-to-speech system. With such an effort, we developed simpler speech processing techniques and achieved faster processing speed by using only an ordinary personal computer. We also developed delicate methods to measure the intelligibility, comprehensibility, and naturalness of a Mandarin text-to-speech system. Our system performs very well compared with existing systems. We first develop a set of useful algorithms and methods to deal with some features of the syllables, such as duration, amplitude, fundamental frequency, pause, and so on. Based on these algorithms and methods, we then build a Mandarin text-to-speech system. Given any Chinese text in some computerized form, e. g. , in BIG-5 code representation, our system can pronounce the text in real time. Our text-to-speech system runs on an IBM 80486 compatible PC, with no special hardware for signal processing. The evaluation of our text-to-speech system is based on a proposed subjective evaluation method. An evaluation was made by 51 undergraduate students. The intelligibility of our text-to-speech system is 99. 5%, the comprehensibility of our text-to-speech system is 92. 6%, and the naturalness of our text-to-speech system is 81. 512 points in a percentile grading system (the highest score is 100 points, and the lowest score is 0 point). Other 40 Ph. D. students also did the same evaluation about naturalness. The result shows that the naturalness of our text-to-speech system is 82. 8 points in a percentile grading system.
URL: https://global.ieice.org/en_transactions/information/10.1587/e81-d_6_545/_p

Copy

@ARTICLE{e81-d_6_545,
author={Yih-Jeng LIN, Ming-Shing YU, },
journal={IEICE TRANSACTIONS on Information},
title={An Efficient Mandarin Text-to-Speech System on Time Domain},
year={1998},
volume={E81-D},
number={6},
pages={545-555},
abstract={This paper describes a complete Mandarin text-to-speech system on time domain. We take advantage of the advancement of memory technology, which achieves ever-increasing capacity and ever-lower price. We try to collect as more as possible the synthesis units in a Mandarin text-to-speech system. With such an effort, we developed simpler speech processing techniques and achieved faster processing speed by using only an ordinary personal computer. We also developed delicate methods to measure the intelligibility, comprehensibility, and naturalness of a Mandarin text-to-speech system. Our system performs very well compared with existing systems. We first develop a set of useful algorithms and methods to deal with some features of the syllables, such as duration, amplitude, fundamental frequency, pause, and so on. Based on these algorithms and methods, we then build a Mandarin text-to-speech system. Given any Chinese text in some computerized form, e. g. , in BIG-5 code representation, our system can pronounce the text in real time. Our text-to-speech system runs on an IBM 80486 compatible PC, with no special hardware for signal processing. The evaluation of our text-to-speech system is based on a proposed subjective evaluation method. An evaluation was made by 51 undergraduate students. The intelligibility of our text-to-speech system is 99. 5%, the comprehensibility of our text-to-speech system is 92. 6%, and the naturalness of our text-to-speech system is 81. 512 points in a percentile grading system (the highest score is 100 points, and the lowest score is 0 point). Other 40 Ph. D. students also did the same evaluation about naturalness. The result shows that the naturalness of our text-to-speech system is 82. 8 points in a percentile grading system.},
keywords={},
doi={},
ISSN={},
month={June},}

Copy

TY - JOUR
TI - An Efficient Mandarin Text-to-Speech System on Time Domain
T2 - IEICE TRANSACTIONS on Information
SP - 545
EP - 555
AU - Yih-Jeng LIN
AU - Ming-Shing YU
PY - 1998
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E81-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 1998
AB - This paper describes a complete Mandarin text-to-speech system on time domain. We take advantage of the advancement of memory technology, which achieves ever-increasing capacity and ever-lower price. We try to collect as more as possible the synthesis units in a Mandarin text-to-speech system. With such an effort, we developed simpler speech processing techniques and achieved faster processing speed by using only an ordinary personal computer. We also developed delicate methods to measure the intelligibility, comprehensibility, and naturalness of a Mandarin text-to-speech system. Our system performs very well compared with existing systems. We first develop a set of useful algorithms and methods to deal with some features of the syllables, such as duration, amplitude, fundamental frequency, pause, and so on. Based on these algorithms and methods, we then build a Mandarin text-to-speech system. Given any Chinese text in some computerized form, e. g. , in BIG-5 code representation, our system can pronounce the text in real time. Our text-to-speech system runs on an IBM 80486 compatible PC, with no special hardware for signal processing. The evaluation of our text-to-speech system is based on a proposed subjective evaluation method. An evaluation was made by 51 undergraduate students. The intelligibility of our text-to-speech system is 99. 5%, the comprehensibility of our text-to-speech system is 92. 6%, and the naturalness of our text-to-speech system is 81. 512 points in a percentile grading system (the highest score is 100 points, and the lowest score is 0 point). Other 40 Ph. D. students also did the same evaluation about naturalness. The result shows that the naturalness of our text-to-speech system is 82. 8 points in a percentile grading system.
ER -