The search functionality is under construction.

IEICE TRANSACTIONS on Information

Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Goshu NAGINO, Makoto SHOZAKAI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO

  • Full Text Views

    0

  • Cite this

Summary :

This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.

Publication
IEICE TRANSACTIONS on Information Vol.E91-D No.3 pp.607-614
Publication Date
2008/03/01
Publicized
Online ISSN
1745-1361
DOI
10.1093/ietisy/e91-d.3.607
Type of Manuscript
Special Section PAPER (Special Section on Robust Speech Processing in Realistic Environments)
Category
Corpus

Authors

Keyword