Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition

Yasuhisa FUJII; Kazumasa YAMAMOTO; Seiichi NAKAGAWA

doi:10.1587/transinf.E95.D.2094

IEICE TRANSACTIONS on Information

Open Access
Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition

Yasuhisa FUJII, Kazumasa YAMAMOTO, Seiichi NAKAGAWA

Full Text Views

43

Cite this

Free PDF (324.9KB)

Summary :

In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a Multi-Layer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set.

Publication: IEICE TRANSACTIONS on Information Vol.E95-D No.8 pp.2094-2104

Publication Date: 2012/08/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E95.D.2094

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Yasuhisa FUJII, Kazumasa YAMAMOTO, Seiichi NAKAGAWA, "Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition" in IEICE TRANSACTIONS on Information, vol. E95-D, no. 8, pp. 2094-2104, August 2012, doi: 10.1587/transinf.E95.D.2094.
Abstract: In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a Multi-Layer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E95.D.2094/_p

Copy

@ARTICLE{e95-d_8_2094,
author={Yasuhisa FUJII, Kazumasa YAMAMOTO, Seiichi NAKAGAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition},
year={2012},
volume={E95-D},
number={8},
pages={2094-2104},
abstract={In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a Multi-Layer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set.},
keywords={},
doi={10.1587/transinf.E95.D.2094},
ISSN={1745-1361},
month={August},}

Copy

TY - JOUR
TI - Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 2094
EP - 2104
AU - Yasuhisa FUJII
AU - Kazumasa YAMAMOTO
AU - Seiichi NAKAGAWA
PY - 2012
DO - 10.1587/transinf.E95.D.2094
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2012
AB - In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a Multi-Layer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set.
ER -

IEICE TRANSACTIONS on Information

Open Access
Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Open AccessHidden Conditional Neural Fields for Continuous Phoneme Speech Recognition

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

Open Access
Hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition