Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

Weifeng LI; Tetsuya SHINDE; Hiroshi FUJIMURA; Chiyomi MIYAJIMA; Takanori NISHINO; Katunobu ITOU; Kazuya TAKEDA; Fumitada ITAKURA

doi:10.1093/ietisy/e88-d.3.384

IEICE TRANSACTIONS on Information

Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

Weifeng LI, Tetsuya SHINDE, Hiroshi FUJIMURA, Chiyomi MIYAJIMA, Takanori NISHINO, Katunobu ITOU, Kazuya TAKEDA, Fumitada ITAKURA

Full Text Views

0

Cite this

Summary :

This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.

Publication: IEICE TRANSACTIONS on Information Vol.E88-D No.3 pp.384-390

Publication Date: 2005/03/01

Publicized

Online ISSN

DOI: 10.1093/ietisy/e88-d.3.384

Type of Manuscript: Special Section PAPER (Special Section on Corpus-Based Speech Technologies)

Category: Feature Extraction and Acoustic Medelings

Cite this

Copy

Weifeng LI, Tetsuya SHINDE, Hiroshi FUJIMURA, Chiyomi MIYAJIMA, Takanori NISHINO, Katunobu ITOU, Kazuya TAKEDA, Fumitada ITAKURA, "Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones" in IEICE TRANSACTIONS on Information, vol. E88-D, no. 3, pp. 384-390, March 2005, doi: 10.1093/ietisy/e88-d.3.384.
Abstract: This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e88-d.3.384/_p

Copy

@ARTICLE{e88-d_3_384,
author={Weifeng LI, Tetsuya SHINDE, Hiroshi FUJIMURA, Chiyomi MIYAJIMA, Takanori NISHINO, Katunobu ITOU, Kazuya TAKEDA, Fumitada ITAKURA, },
journal={IEICE TRANSACTIONS on Information},
title={Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones},
year={2005},
volume={E88-D},
number={3},
pages={384-390},
abstract={This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.},
keywords={},
doi={10.1093/ietisy/e88-d.3.384},
ISSN={},
month={March},}

Copy

TY - JOUR
TI - Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones
T2 - IEICE TRANSACTIONS on Information
SP - 384
EP - 390
AU - Weifeng LI
AU - Tetsuya SHINDE
AU - Hiroshi FUJIMURA
AU - Chiyomi MIYAJIMA
AU - Takanori NISHINO
AU - Katunobu ITOU
AU - Kazuya TAKEDA
AU - Fumitada ITAKURA
PY - 2005
DO - 10.1093/ietisy/e88-d.3.384
JO - IEICE TRANSACTIONS on Information
SN -
VL - E88-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2005
AB - This paper describes a new multi-channel method of noisy speech recognition, which estimates the log spectrum of speech at a close-talking microphone based on the multiple regression of the log spectra (MRLS) of noisy signals captured by distributed microphones. The advantages of the proposed method are as follows: 1) The method does not require a sensitive geometric layout, calibration of the sensors nor additional pre-processing for tracking the speech source; 2) System works in very small computation amounts; and 3) Regression weights can be statistically optimized over the given training data. Once the optimal regression weights are obtained by regression learning, they can be utilized to generate the estimated log spectrum in the recognition phase, where the speech of close-talking is no longer required. The performance of the proposed method is illustrated by speech recognition of real in-car dialogue data. In comparison to the nearest distant microphone and multi-microphone adaptive beamformer, the proposed approach obtains relative word error rate (WER) reductions of 9.8% and 3.6%, respectively.
ER -

IEICE TRANSACTIONS on Information

Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles