This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Zhipeng ZHANG, Toshiaki SUGIMURA, Sadaoki FURUI, "Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation" in IEICE TRANSACTIONS on Information,
vol. E88-D, no. 9, pp. 2168-2176, September 2005, doi: 10.1093/ietisy/e88-d.9.2168.
Abstract: This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.9.2168/_p
Copy
@ARTICLE{e88-d_9_2168,
author={Zhipeng ZHANG, Toshiaki SUGIMURA, Sadaoki FURUI, },
journal={IEICE TRANSACTIONS on Information},
title={Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation},
year={2005},
volume={E88-D},
number={9},
pages={2168-2176},
abstract={This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.},
keywords={},
doi={10.1093/ietisy/e88-d.9.2168},
ISSN={},
month={September},}
Copy
TY - JOUR
TI - Tree-Structured Clustering Methods for Piecewise Linear-Transformation-Based Noise Adaptation
T2 - IEICE TRANSACTIONS on Information
SP - 2168
EP - 2176
AU - Zhipeng ZHANG
AU - Toshiaki SUGIMURA
AU - Sadaoki FURUI
PY - 2005
DO - 10.1093/ietisy/e88-d.9.2168
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2005
AB - This paper proposes the application of tree-structured clustering to the processing of noisy speech collected under various SNR conditions in the framework of piecewise-linear transformation (PLT)-based HMM adaptation for noisy speech. Three kinds of clustering methods are described: a one-step clustering method that integrates noise and SNR conditions and two two-step clustering methods that construct trees for each SNR condition. According to the clustering results, a noisy speech HMM is made for each node of the tree structure. Based on the likelihood maximization criterion, the HMM that best matches the input speech is selected by tracing the tree from top to bottom, and the selected HMM is further adapted by linear transformation. The proposed methods are evaluated by applying them to a Japanese dialogue recognition system. The results confirm that the proposed methods are effective in recognizing digitally noise-added speech and actual noisy speech issued by a wide range of speakers under various noise conditions. The results also indicate that the one-step clustering method gives better performance than the two-step clustering methods.
ER -