Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation

Zhipeng ZHANG; Toshiaki SUGIMURA; Sadaoki FURUI

doi:10.1093/ietisy/e88-d.9.2168

Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation

Zhipeng ZHANG, Toshiaki SUGIMURA, Sadaoki FURUI

Full Text Views

0

Cite this

Summary :

This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.

Publication: IEICE TRANSACTIONS on Information Vol.E88-D No.9 pp.2168-2176

Publication Date: 2005/09/01

Publicized

Online ISSN

DOI: 10.1093/ietisy/e88-d.9.2168

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Zhipeng ZHANG, Toshiaki SUGIMURA, Sadaoki FURUI, "Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation" in IEICE TRANSACTIONS on Information, vol. E88-D, no. 9, pp. 2168-2176, September 2005, doi: 10.1093/ietisy/e88-d.9.2168.
Abstract: This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.9.2168/_p

Copy

@ARTICLE{e88-d_9_2168,
author={Zhipeng ZHANG, Toshiaki SUGIMURA, Sadaoki FURUI, },
journal={IEICE TRANSACTIONS on Information},
title={Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation},
year={2005},
volume={E88-D},
number={9},
pages={2168-2176},
abstract={This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.},
keywords={},
doi={10.1093/ietisy/e88-d.9.2168},
ISSN={},
month={September},}

Copy

TY - JOUR
TI - Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation
T2 - IEICE TRANSACTIONS on Information
SP - 2168
EP - 2176
AU - Zhipeng ZHANG
AU - Toshiaki SUGIMURA
AU - Sadaoki FURUI
PY - 2005
DO - 10.1093/ietisy/e88-d.9.2168
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2005
AB - This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.
ER -