This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
Naoki SAWADA
University of Yamanashi
Hiromitsu NISHIZAKI
University of Yamanashi
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Naoki SAWADA, Hiromitsu NISHIZAKI, "Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 10, pp. 2518-2527, October 2016, doi: 10.1587/transinf.2016SLP0012.
Abstract: This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016SLP0012/_p
Copy
@ARTICLE{e99-d_10_2518,
author={Naoki SAWADA, Hiromitsu NISHIZAKI, },
journal={IEICE TRANSACTIONS on Information},
title={Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection},
year={2016},
volume={E99-D},
number={10},
pages={2518-2527},
abstract={This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.},
keywords={},
doi={10.1587/transinf.2016SLP0012},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection
T2 - IEICE TRANSACTIONS on Information
SP - 2518
EP - 2527
AU - Naoki SAWADA
AU - Hiromitsu NISHIZAKI
PY - 2016
DO - 10.1587/transinf.2016SLP0012
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2016
AB - This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
ER -