A Covariance-Tying Technique for HMM-Based Speech Synthesis

Keiichiro OURA; Heiga ZEN; Yoshihiko NANKAKU; Akinobu LEE; Keiichi TOKUDA

doi:10.1587/transinf.E93.D.595

A Covariance-Tying Technique for HMM-Based Speech Synthesis

Keiichiro OURA, Heiga ZEN, Yoshihiko NANKAKU, Akinobu LEE, Keiichi TOKUDA

Full Text Views

0

Cite this

Summary :

A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.

Publication: IEICE TRANSACTIONS on Information Vol.E93-D No.3 pp.595-601

Publication Date: 2010/03/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E93.D.595

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Keiichiro OURA, Heiga ZEN, Yoshihiko NANKAKU, Akinobu LEE, Keiichi TOKUDA, "A Covariance-Tying Technique for HMM-Based Speech Synthesis" in IEICE TRANSACTIONS on Information, vol. E93-D, no. 3, pp. 595-601, March 2010, doi: 10.1587/transinf.E93.D.595.
Abstract: A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.595/_p

Copy

@ARTICLE{e93-d_3_595,
author={Keiichiro OURA, Heiga ZEN, Yoshihiko NANKAKU, Akinobu LEE, Keiichi TOKUDA, },
journal={IEICE TRANSACTIONS on Information},
title={A Covariance-Tying Technique for HMM-Based Speech Synthesis},
year={2010},
volume={E93-D},
number={3},
pages={595-601},
abstract={A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.},
keywords={},
doi={10.1587/transinf.E93.D.595},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - A Covariance-Tying Technique for HMM-Based Speech Synthesis
T2 - IEICE TRANSACTIONS on Information
SP - 595
EP - 601
AU - Keiichiro OURA
AU - Heiga ZEN
AU - Yoshihiko NANKAKU
AU - Akinobu LEE
AU - Keiichi TOKUDA
PY - 2010
DO - 10.1587/transinf.E93.D.595
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2010
AB - A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.
ER -