Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

Xiao-Dong WANG; Keikichi HIROSE; Jin-Song ZHANG; Nobuaki MINEMATSU

doi:10.1093/ietisy/e91-d.6.1748

IEICE TRANSACTIONS on Information

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

Xiao-Dong WANG, Keikichi HIROSE, Jin-Song ZHANG, Nobuaki MINEMATSU

Full Text Views

0

Cite this

Summary :

A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perceptron (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.

Publication: IEICE TRANSACTIONS on Information Vol.E91-D No.6 pp.1748-1755

Publication Date: 2008/06/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e91-d.6.1748

Type of Manuscript: PAPER

Category: Pattern Recognition

Cite this

Copy

Xiao-Dong WANG, Keikichi HIROSE, Jin-Song ZHANG, Nobuaki MINEMATSU, "Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network" in IEICE TRANSACTIONS on Information, vol. E91-D, no. 6, pp. 1748-1755, June 2008, doi: 10.1093/ietisy/e91-d.6.1748.
Abstract: A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perceptron (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.6.1748/_p

Copy

@ARTICLE{e91-d_6_1748,
author={Xiao-Dong WANG, Keikichi HIROSE, Jin-Song ZHANG, Nobuaki MINEMATSU, },
journal={IEICE TRANSACTIONS on Information},
title={Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network},
year={2008},
volume={E91-D},
number={6},
pages={1748-1755},
abstract={A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perceptron (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.},
keywords={},
doi={10.1093/ietisy/e91-d.6.1748},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network
T2 - IEICE TRANSACTIONS on Information
SP - 1748
EP - 1755
AU - Xiao-Dong WANG
AU - Keikichi HIROSE
AU - Jin-Song ZHANG
AU - Nobuaki MINEMATSU
PY - 2008
DO - 10.1093/ietisy/e91-d.6.1748
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2008
AB - A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perceptron (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.
ER -

IEICE TRANSACTIONS on Information

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Tone Recognition of Continuous Mandarin Speech Based on Tone Nucleus Model and Neural Network

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles