Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection

Naoki SAWADA; Hiromitsu NISHIZAKI

doi:10.1587/transinf.2016SLP0012

IEICE TRANSACTIONS on Information

Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection

Naoki SAWADA, Hiromitsu NISHIZAKI

Full Text Views

0

Cite this

Summary :

This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.10 pp.2518-2527

Publication Date: 2016/10/01

Publicized: 2016/07/19

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016SLP0012

Type of Manuscript: Special Section PAPER (Special Section on Recent Advances in Machine Learning for Spoken Language Processing)

Category: Spoken term detection

Authors

Naoki SAWADA
University of Yamanashi
Hiromitsu NISHIZAKI
University of Yamanashi

Keyword

conditional random fields, phoneme-to-phoneme confusion learning, re-ranking, spoken term detection, triphone detection

Cite this

Copy

Naoki SAWADA, Hiromitsu NISHIZAKI, "Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 10, pp. 2518-2527, October 2016, doi: 10.1587/transinf.2016SLP0012.
Abstract: This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016SLP0012/_p

Copy

@ARTICLE{e99-d_10_2518,
author={Naoki SAWADA, Hiromitsu NISHIZAKI, },
journal={IEICE TRANSACTIONS on Information},
title={Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection},
year={2016},
volume={E99-D},
number={10},
pages={2518-2527},
abstract={This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.},
keywords={},
doi={10.1587/transinf.2016SLP0012},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection
T2 - IEICE TRANSACTIONS on Information
SP - 2518
EP - 2527
AU - Naoki SAWADA
AU - Hiromitsu NISHIZAKI
PY - 2016
DO - 10.1587/transinf.2016SLP0012
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2016
AB - This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
ER -

IEICE TRANSACTIONS on Information