Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech

Katsutoshi OHTSUKI; Tatsuo MATSUOKA; Shoichi MATSUNAGA; Sadaoki FURUI

Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech

Katsutoshi OHTSUKI, Tatsuo MATSUOKA, Shoichi MATSUNAGA, Sadaoki FURUI

Full Text Views

0

Cite this

Summary :

In this paper, we propose topic extraction models based on statistical relevance scores between topic words and words in articles, and report results obtained in topic extraction experiments using continuous speech recognition for Japanese broadcast news utterances. We attempt to represent a topic of news speech using a combination of multiple topic words, which are important words in the news article or words relevant to the news. We assume a topic of news is represented by a combination of words. We statistically model mapping from words in an article to topic words. Using the mapping, the topic extraction model can extract topic words even if they do not appear in the article. We train a topic extraction model capable of computing the degree of relevance between a topic word and a word in an article by using newspaper text covering a five-year period. The degree of relevance between those words is calculated based on measures such as mutual information or the χ2-method. In experiments extracting five topic words using a χ2-based model, we achieve 72% precision and 12% recall for speech recognition results. Speech recognition results generally include a number of recognition errors, which degrades topic extraction performance. To avoid this, we employ N-best candidates and likelihood given by acoustic and language models. In experiments, we find that extracting five topic words using N-best candidate and likelihood values achieves significantly improved precision.

Publication: IEICE TRANSACTIONS on Information Vol.E85-D No.7 pp.1138-1144

Publication Date: 2002/07/01

Publicized

Online ISSN

DOI

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Katsutoshi OHTSUKI, Tatsuo MATSUOKA, Shoichi MATSUNAGA, Sadaoki FURUI, "Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech" in IEICE TRANSACTIONS on Information, vol. E85-D, no. 7, pp. 1138-1144, July 2002, doi: .
Abstract: In this paper, we propose topic extraction models based on statistical relevance scores between topic words and words in articles, and report results obtained in topic extraction experiments using continuous speech recognition for Japanese broadcast news utterances. We attempt to represent a topic of news speech using a combination of multiple topic words, which are important words in the news article or words relevant to the news. We assume a topic of news is represented by a combination of words. We statistically model mapping from words in an article to topic words. Using the mapping, the topic extraction model can extract topic words even if they do not appear in the article. We train a topic extraction model capable of computing the degree of relevance between a topic word and a word in an article by using newspaper text covering a five-year period. The degree of relevance between those words is calculated based on measures such as mutual information or the χ2-method. In experiments extracting five topic words using a χ2-based model, we achieve 72% precision and 12% recall for speech recognition results. Speech recognition results generally include a number of recognition errors, which degrades topic extraction performance. To avoid this, we employ N-best candidates and likelihood given by acoustic and language models. In experiments, we find that extracting five topic words using N-best candidate and likelihood values achieves significantly improved precision.
URL: https://global.ieice.org/en_transactions/information/10.1587/e85-d_7_1138/_p

Copy

@ARTICLE{e85-d_7_1138,
author={Katsutoshi OHTSUKI, Tatsuo MATSUOKA, Shoichi MATSUNAGA, Sadaoki FURUI, },
journal={IEICE TRANSACTIONS on Information},
title={Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech},
year={2002},
volume={E85-D},
number={7},
pages={1138-1144},
abstract={In this paper, we propose topic extraction models based on statistical relevance scores between topic words and words in articles, and report results obtained in topic extraction experiments using continuous speech recognition for Japanese broadcast news utterances. We attempt to represent a topic of news speech using a combination of multiple topic words, which are important words in the news article or words relevant to the news. We assume a topic of news is represented by a combination of words. We statistically model mapping from words in an article to topic words. Using the mapping, the topic extraction model can extract topic words even if they do not appear in the article. We train a topic extraction model capable of computing the degree of relevance between a topic word and a word in an article by using newspaper text covering a five-year period. The degree of relevance between those words is calculated based on measures such as mutual information or the χ2-method. In experiments extracting five topic words using a χ2-based model, we achieve 72% precision and 12% recall for speech recognition results. Speech recognition results generally include a number of recognition errors, which degrades topic extraction performance. To avoid this, we employ N-best candidates and likelihood given by acoustic and language models. In experiments, we find that extracting five topic words using N-best candidate and likelihood values achieves significantly improved precision.},
keywords={},
doi={},
ISSN={},
month={July},}

Copy

TY - JOUR
TI - Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech
T2 - IEICE TRANSACTIONS on Information
SP - 1138
EP - 1144
AU - Katsutoshi OHTSUKI
AU - Tatsuo MATSUOKA
AU - Shoichi MATSUNAGA
AU - Sadaoki FURUI
PY - 2002
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E85-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2002
AB - In this paper, we propose topic extraction models based on statistical relevance scores between topic words and words in articles, and report results obtained in topic extraction experiments using continuous speech recognition for Japanese broadcast news utterances. We attempt to represent a topic of news speech using a combination of multiple topic words, which are important words in the news article or words relevant to the news. We assume a topic of news is represented by a combination of words. We statistically model mapping from words in an article to topic words. Using the mapping, the topic extraction model can extract topic words even if they do not appear in the article. We train a topic extraction model capable of computing the degree of relevance between a topic word and a word in an article by using newspaper text covering a five-year period. The degree of relevance between those words is calculated based on measures such as mutual information or the χ2-method. In experiments extracting five topic words using a χ2-based model, we achieve 72% precision and 12% recall for speech recognition results. Speech recognition results generally include a number of recognition errors, which degrades topic extraction performance. To avoid this, we employ N-best candidates and likelihood given by acoustic and language models. In experiments, we find that extracting five topic words using N-best candidate and likelihood values achieves significantly improved precision.
ER -