Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery

Michael HECK; Sakriani SAKTI; Satoshi NAKAMURA

doi:10.1587/transinf.2017EDP7175

IEICE TRANSACTIONS on Information

Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery

Michael HECK, Sakriani SAKTI, Satoshi NAKAMURA

Full Text Views

0

Cite this

Summary :

In this work we utilize feature transformations that are common in supervised learning without having prior supervision, with the goal to improve Dirichlet process Gaussian mixture model (DPGMM) based acoustic unit discovery. The motivation of using such transformations is to create feature vectors that are more suitable for clustering. The need of labels for these methods makes it difficult to use them in a zero resource setting. To overcome this issue we utilize a first iteration of DPGMM clustering to generate frame based class labels for the target data. The labels serve as basis for learning linear discriminant analysis (LDA), maximum likelihood linear transform (MLLT) and feature-space maximum likelihood linear regression (fMLLR) based feature transformations. The novelty of our approach is the way how we use a traditional acoustic model training pipeline for supervised learning to estimate feature transformations in a zero resource scenario. We show that the learned transformations greatly support the DPGMM sampler in finding better clusters, according to the performance of the DPGMM posteriorgrams on the ABX sound class discriminability task. We also introduce a method for combining posteriorgram outputs of multiple clusterings and demonstrate that such combinations can further improve sound class discriminability.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.1 pp.205-214

Publication Date: 2018/01/01

Publicized: 2017/10/20

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDP7175

Type of Manuscript: PAPER

Category: Speech and Hearing

Authors

Michael HECK
  Nara Institute of Science and Technology
Sakriani SAKTI
  Nara Institute of Science and Technology
Satoshi NAKAMURA
  Nara Institute of Science and Technology

Keyword

acoustic unit discovery, Bayesian nonparametrics, feature transformation, unsupervised subword modeling, zero resource

Cite this

Copy

Michael HECK, Sakriani SAKTI, Satoshi NAKAMURA, "Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 1, pp. 205-214, January 2018, doi: 10.1587/transinf.2017EDP7175.
Abstract: In this work we utilize feature transformations that are common in supervised learning without having prior supervision, with the goal to improve Dirichlet process Gaussian mixture model (DPGMM) based acoustic unit discovery. The motivation of using such transformations is to create feature vectors that are more suitable for clustering. The need of labels for these methods makes it difficult to use them in a zero resource setting. To overcome this issue we utilize a first iteration of DPGMM clustering to generate frame based class labels for the target data. The labels serve as basis for learning linear discriminant analysis (LDA), maximum likelihood linear transform (MLLT) and feature-space maximum likelihood linear regression (fMLLR) based feature transformations. The novelty of our approach is the way how we use a traditional acoustic model training pipeline for supervised learning to estimate feature transformations in a zero resource scenario. We show that the learned transformations greatly support the DPGMM sampler in finding better clusters, according to the performance of the DPGMM posteriorgrams on the ABX sound class discriminability task. We also introduce a method for combining posteriorgram outputs of multiple clusterings and demonstrate that such combinations can further improve sound class discriminability.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7175/_p

Copy

@ARTICLE{e101-d_1_205,
author={Michael HECK, Sakriani SAKTI, Satoshi NAKAMURA, },
journal={IEICE TRANSACTIONS on Information},
title={Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery},
year={2018},
volume={E101-D},
number={1},
pages={205-214},
abstract={In this work we utilize feature transformations that are common in supervised learning without having prior supervision, with the goal to improve Dirichlet process Gaussian mixture model (DPGMM) based acoustic unit discovery. The motivation of using such transformations is to create feature vectors that are more suitable for clustering. The need of labels for these methods makes it difficult to use them in a zero resource setting. To overcome this issue we utilize a first iteration of DPGMM clustering to generate frame based class labels for the target data. The labels serve as basis for learning linear discriminant analysis (LDA), maximum likelihood linear transform (MLLT) and feature-space maximum likelihood linear regression (fMLLR) based feature transformations. The novelty of our approach is the way how we use a traditional acoustic model training pipeline for supervised learning to estimate feature transformations in a zero resource scenario. We show that the learned transformations greatly support the DPGMM sampler in finding better clusters, according to the performance of the DPGMM posteriorgrams on the ABX sound class discriminability task. We also introduce a method for combining posteriorgram outputs of multiple clusterings and demonstrate that such combinations can further improve sound class discriminability.},
keywords={},
doi={10.1587/transinf.2017EDP7175},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery
T2 - IEICE TRANSACTIONS on Information
SP - 205
EP - 214
AU - Michael HECK
AU - Sakriani SAKTI
AU - Satoshi NAKAMURA
PY - 2018
DO - 10.1587/transinf.2017EDP7175
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2018
AB - In this work we utilize feature transformations that are common in supervised learning without having prior supervision, with the goal to improve Dirichlet process Gaussian mixture model (DPGMM) based acoustic unit discovery. The motivation of using such transformations is to create feature vectors that are more suitable for clustering. The need of labels for these methods makes it difficult to use them in a zero resource setting. To overcome this issue we utilize a first iteration of DPGMM clustering to generate frame based class labels for the target data. The labels serve as basis for learning linear discriminant analysis (LDA), maximum likelihood linear transform (MLLT) and feature-space maximum likelihood linear regression (fMLLR) based feature transformations. The novelty of our approach is the way how we use a traditional acoustic model training pipeline for supervised learning to estimate feature transformations in a zero resource scenario. We show that the learned transformations greatly support the DPGMM sampler in finding better clusters, according to the performance of the DPGMM posteriorgrams on the ABX sound class discriminability task. We also introduce a method for combining posteriorgram outputs of multiple clusterings and demonstrate that such combinations can further improve sound class discriminability.
ER -

IEICE TRANSACTIONS on Information