Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition

Jwu-Sheng HU; Chieh-Cheng CHENG

doi:10.1093/ietfec/e88-a.9.2401

Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition

Jwu-Sheng HU, Chieh-Cheng CHENG

Full Text Views

0

Cite this

Summary :

This investigation proposed two array beamformers SPFDBB (Soft Penalty Frequency Domain Block Beamformer) and FDABB (Frequency Domain Adjustable Block Beamformer). Compared with the conventional beamformers, these frequency-domain methods can significantly reduce the computation power requirement in ASR (Automatic Speech Recognition) based applications. Like other reference signal based techniques, SPFDBB and FDABB minimize microphone's mismatch, desired signal cancellation caused by reflection effects and resolution due to the array's position. Additionally, these proposed methods are suitable for both near-field and far-field environments. Generally, the convolution relation between channel and speech source in time domain cannot be modeled accurately as a multiplication in the frequency domain with a finite window size, especially in ASR applications. SPFDBB and FDABB can approximate this multiplication by treating several frames as a block to achieve a better beamforming result. Moreover, FDABB adjusts the number of frames on-line to cope with the variation of characteristics in both speech and interference signals. A better performance was found to be achievable by combining these methods with an ASR mechanism.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E88-A No.9 pp.2401-2411

Publication Date: 2005/09/01

Publicized

Online ISSN

DOI: 10.1093/ietfec/e88-a.9.2401

Type of Manuscript: PAPER

Category: Noise and Vibration

Cite this

Copy

Jwu-Sheng HU, Chieh-Cheng CHENG, "Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition" in IEICE TRANSACTIONS on Fundamentals, vol. E88-A, no. 9, pp. 2401-2411, September 2005, doi: 10.1093/ietfec/e88-a.9.2401.
Abstract: This investigation proposed two array beamformers SPFDBB (Soft Penalty Frequency Domain Block Beamformer) and FDABB (Frequency Domain Adjustable Block Beamformer). Compared with the conventional beamformers, these frequency-domain methods can significantly reduce the computation power requirement in ASR (Automatic Speech Recognition) based applications. Like other reference signal based techniques, SPFDBB and FDABB minimize microphone's mismatch, desired signal cancellation caused by reflection effects and resolution due to the array's position. Additionally, these proposed methods are suitable for both near-field and far-field environments. Generally, the convolution relation between channel and speech source in time domain cannot be modeled accurately as a multiplication in the frequency domain with a finite window size, especially in ASR applications. SPFDBB and FDABB can approximate this multiplication by treating several frames as a block to achieve a better beamforming result. Moreover, FDABB adjusts the number of frames on-line to cope with the variation of characteristics in both speech and interference signals. A better performance was found to be achievable by combining these methods with an ASR mechanism.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1093/ietfec/e88-a.9.2401/_p

Copy

@ARTICLE{e88-a_9_2401,
author={Jwu-Sheng HU, Chieh-Cheng CHENG, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition},
year={2005},
volume={E88-A},
number={9},
pages={2401-2411},
abstract={This investigation proposed two array beamformers SPFDBB (Soft Penalty Frequency Domain Block Beamformer) and FDABB (Frequency Domain Adjustable Block Beamformer). Compared with the conventional beamformers, these frequency-domain methods can significantly reduce the computation power requirement in ASR (Automatic Speech Recognition) based applications. Like other reference signal based techniques, SPFDBB and FDABB minimize microphone's mismatch, desired signal cancellation caused by reflection effects and resolution due to the array's position. Additionally, these proposed methods are suitable for both near-field and far-field environments. Generally, the convolution relation between channel and speech source in time domain cannot be modeled accurately as a multiplication in the frequency domain with a finite window size, especially in ASR applications. SPFDBB and FDABB can approximate this multiplication by treating several frames as a block to achieve a better beamforming result. Moreover, FDABB adjusts the number of frames on-line to cope with the variation of characteristics in both speech and interference signals. A better performance was found to be achievable by combining these methods with an ASR mechanism.},
keywords={},
doi={10.1093/ietfec/e88-a.9.2401},
ISSN={},
month={September},}

Copy

TY - JOUR
TI - Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2401
EP - 2411
AU - Jwu-Sheng HU
AU - Chieh-Cheng CHENG
PY - 2005
DO - 10.1093/ietfec/e88-a.9.2401
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E88-A
IS - 9
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - September 2005
AB - This investigation proposed two array beamformers SPFDBB (Soft Penalty Frequency Domain Block Beamformer) and FDABB (Frequency Domain Adjustable Block Beamformer). Compared with the conventional beamformers, these frequency-domain methods can significantly reduce the computation power requirement in ASR (Automatic Speech Recognition) based applications. Like other reference signal based techniques, SPFDBB and FDABB minimize microphone's mismatch, desired signal cancellation caused by reflection effects and resolution due to the array's position. Additionally, these proposed methods are suitable for both near-field and far-field environments. Generally, the convolution relation between channel and speech source in time domain cannot be modeled accurately as a multiplication in the frequency domain with a finite window size, especially in ASR applications. SPFDBB and FDABB can approximate this multiplication by treating several frames as a block to achieve a better beamforming result. Moreover, FDABB adjusts the number of frames on-line to cope with the variation of characteristics in both speech and interference signals. A better performance was found to be achievable by combining these methods with an ASR mechanism.
ER -