Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

Toru IMAI; Shoei SATO; Shinichi HOMMA; Kazuo ONOE; Akio KOBAYASHI

doi:10.1093/ietisy/e90-d.8.1286

IEICE TRANSACTIONS on Information

Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

Toru IMAI, Shoei SATO, Shinichi HOMMA, Kazuo ONOE, Akio KOBAYASHI

Full Text Views

0

Cite this

Summary :

This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.

Publication: IEICE TRANSACTIONS on Information Vol.E90-D No.8 pp.1286-1291

Publication Date: 2007/08/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e90-d.8.1286

Type of Manuscript: PAPER

Category: Speech and Hearing

Cite this

Copy

Toru IMAI, Shoei SATO, Shinichi HOMMA, Kazuo ONOE, Akio KOBAYASHI, "Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News" in IEICE TRANSACTIONS on Information, vol. E90-D, no. 8, pp. 1286-1291, August 2007, doi: 10.1093/ietisy/e90-d.8.1286.
Abstract: This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e90-d.8.1286/_p

Copy

@ARTICLE{e90-d_8_1286,
author={Toru IMAI, Shoei SATO, Shinichi HOMMA, Kazuo ONOE, Akio KOBAYASHI, },
journal={IEICE TRANSACTIONS on Information},
title={Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News},
year={2007},
volume={E90-D},
number={8},
pages={1286-1291},
abstract={This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.},
keywords={},
doi={10.1093/ietisy/e90-d.8.1286},
ISSN={1745-1361},
month={August},}

Copy

TY - JOUR
TI - Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News
T2 - IEICE TRANSACTIONS on Information
SP - 1286
EP - 1291
AU - Toru IMAI
AU - Shoei SATO
AU - Shinichi HOMMA
AU - Kazuo ONOE
AU - Akio KOBAYASHI
PY - 2007
DO - 10.1093/ietisy/e90-d.8.1286
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 2007
AB - This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. Obtaining the speech segments, the phoneme recognizer also identifies gender attributes with high discrimination in order to guide the subsequent dual-gender continuous speech recognizer efficiently. As soon as the start-point is detected, the continuous speech recognizer with paralleled gender-dependent acoustic models starts a search and allows search transitions between male and female in a speech segment based on the gender attributes. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing the false rejection rate from 4.6% to 0.53% and also recognition errors in comparison with a conventional method using adaptive energy thresholds. It was also effective in identifying the gender attributes, whose correct rate was 99.7% of words. With the new speech detection and the gender identification, the proposed dual-gender speech recognition significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost feasible for real-time operation.
ER -

IEICE TRANSACTIONS on Information

Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles