Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Goshu NAGINO; Makoto SHOZAKAI; Tomoki TODA; Hiroshi SARUWATARI; Kiyohiro SHIKANO

doi:10.1093/ietisy/e91-d.3.607

IEICE TRANSACTIONS on Information

Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Goshu NAGINO, Makoto SHOZAKAI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO

Full Text Views

0

Cite this

Summary :

This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.

Publication: IEICE TRANSACTIONS on Information Vol.E91-D No.3 pp.607-614

Publication Date: 2008/03/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e91-d.3.607

Type of Manuscript: Special Section PAPER (Special Section on Robust Speech Processing in Realistic Environments)

Category: Corpus

Cite this

Copy

Goshu NAGINO, Makoto SHOZAKAI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO, "Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method" in IEICE TRANSACTIONS on Information, vol. E91-D, no. 3, pp. 607-614, March 2008, doi: 10.1093/ietisy/e91-d.3.607.
Abstract: This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.3.607/_p

Copy

@ARTICLE{e91-d_3_607,
author={Goshu NAGINO, Makoto SHOZAKAI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO, },
journal={IEICE TRANSACTIONS on Information},
title={Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method},
year={2008},
volume={E91-D},
number={3},
pages={607-614},
abstract={This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.},
keywords={},
doi={10.1093/ietisy/e91-d.3.607},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
T2 - IEICE TRANSACTIONS on Information
SP - 607
EP - 614
AU - Goshu NAGINO
AU - Makoto SHOZAKAI
AU - Tomoki TODA
AU - Hiroshi SARUWATARI
AU - Kiyohiro SHIKANO
PY - 2008
DO - 10.1093/ietisy/e91-d.3.607
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2008
AB - This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.
ER -

IEICE TRANSACTIONS on Information

Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles