WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications

Masanori MORISE; Fumiya YOKOMORI; Kenji OZAWA

doi:10.1587/transinf.2015EDP7457

Open Access
WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications

Masanori MORISE, Fumiya YOKOMORI, Kenji OZAWA

Full Text Views

68

Cite this

Free PDF (756.2KB)

Summary :

A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.7 pp.1877-1884

Publication Date: 2016/07/01

Publicized: 2016/04/05

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2015EDP7457

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Masanori MORISE
  University of Yamanashi
Fumiya YOKOMORI
  University of Yamanashi
Kenji OZAWA
  University of Yamanashi

Keyword

speech analysis, speech synthesis, vocoder, sound quality, real-time processing

Cite this

Copy

Masanori MORISE, Fumiya YOKOMORI, Kenji OZAWA, "WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 7, pp. 1877-1884, July 2016, doi: 10.1587/transinf.2015EDP7457.
Abstract: A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7457/_p

Copy

@ARTICLE{e99-d_7_1877,
author={Masanori MORISE, Fumiya YOKOMORI, Kenji OZAWA, },
journal={IEICE TRANSACTIONS on Information},
title={WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications},
year={2016},
volume={E99-D},
number={7},
pages={1877-1884},
abstract={A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.},
keywords={},
doi={10.1587/transinf.2015EDP7457},
ISSN={1745-1361},
month={July},}

Copy

TY - JOUR
TI - WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications
T2 - IEICE TRANSACTIONS on Information
SP - 1877
EP - 1884
AU - Masanori MORISE
AU - Fumiya YOKOMORI
AU - Kenji OZAWA
PY - 2016
DO - 10.1587/transinf.2015EDP7457
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2016
AB - A vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of real-time applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis algorithm proposed in our previous research. The effectiveness of the system was evaluated by comparing its output with against natural speech including consonants. Its processing speed was also compared with those of conventional systems. The results showed that WORLD was superior to the other systems in terms of both sound quality and processing speed. In particular, it was over ten times faster than the conventional systems, and the real time factor (RTF) indicated that it was fast enough for real-time processing.
ER -