1-1hit |
Yoshiyuki HARA Tsuneo NITTA Hiroyoshi SAITO Ken'ichiro KOBAYASHI
Text-to-speech synthesis (TTS) is currently one of the most important media conversion techniques. In this paper, we describe a Japanese TTS card developed for constructing a personal-computer-based multimedia platform, and a TTS software package developed for a workstation-based multimedia platform. Some applications of this hardware and software are also discussed. The TTS consists of a linguistic processing stage for converting text into phonetic and prosodic information, and a speech processing stage for producing speech from the phonetic and prosodic symbols. The linguistic processing stage uses morphological analysis, rewriting rules for accent movement and pause insertion, and other techniques to impart correct accentuation and a natural-sounding intonation to the synthesized speech. The speech processing stage employs the cepstrum method with consonant-vowel (CV) syllables as the synthesis unit to achieve clear and smooth synthesized speech. All of the processing for converting Japanese text (consisting of mixed Japanese Kanji and Kana characters) to synthesized speech is done internally on the TTS card. This allows the card to be used widely in various applications, including electronic mail and telephone service systems without placing any processing burden on the personal computer. The TTS software was used for an E-mail reading tool on a workstation.