Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

Heiga ZEN; Tomoki TODA; Masaru NAKAMURA; Keiichi TOKUDA

IEICE TRANSACTIONS on Information

Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

Heiga ZEN, Tomoki TODA, Masaru NAKAMURA, Keiichi TOKUDA

Full Text Views

0

Cite this

Summary :

In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.

Publication: IEICE TRANSACTIONS on Information Vol.E90-D No.1 pp.325-333

Publication Date: 2007/01/01

Publicized

Online ISSN: 1745-1361

DOI

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Heiga ZEN, Tomoki TODA, Masaru NAKAMURA, Keiichi TOKUDA, "Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005" in IEICE TRANSACTIONS on Information, vol. E90-D, no. 1, pp. 325-333, January 2007, doi: .
Abstract: In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.
URL: https://global.ieice.org/en_transactions/information/10.1587/e90-d_1_325/_p

Copy

@ARTICLE{e90-d_1_325,
author={Heiga ZEN, Tomoki TODA, Masaru NAKAMURA, Keiichi TOKUDA, },
journal={IEICE TRANSACTIONS on Information},
title={Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005},
year={2007},
volume={E90-D},
number={1},
pages={325-333},
abstract={In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.},
keywords={},
doi={},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005
T2 - IEICE TRANSACTIONS on Information
SP - 325
EP - 333
AU - Heiga ZEN
AU - Tomoki TODA
AU - Masaru NAKAMURA
AU - Keiichi TOKUDA
PY - 2007
DO -
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2007
AB - In January 2005, an open evaluation of corpus-based text-to-speech synthesis systems using common speech datasets, named Blizzard Challenge 2005, was conducted. Nitech group participated in this challenge, entering an HMM-based speech synthesis system called Nitech-HTS 2005. This paper describes the technical details, building processes, and performance of our system. We first give an overview of the basic HMM-based speech synthesis system, and then describe new features integrated into Nitech-HTS 2005 such as STRAIGHT-based vocoding, HSMM-based acoustic modeling, and a speech parameter generation algorithm considering GV. Constructed Nitech-HTS 2005 voices can generate speech waveforms at 0.3RT (real-time ratio) on a 1.6 GHz Pentium 4 machine, and footprints of these voices are less than 2 Mbytes. Subjective listening tests showed that the naturalness and intelligibility of the Nitech-HTS 2005 voices were much better than expected.
ER -

IEICE TRANSACTIONS on Information

Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles