IEICE global.ieice.org Site

Keyword Search Result

[Keyword] microphone(72hit)

21-40hit(72hit)

Distant Speech Recognition Using a Microphone Array Network
Alberto Yoshihiro NAKANO Seiichi NAKAGAWA Kazumasa YAMAMOTO

PAPER-Microphone Array

Vol:
E93-D No:9
Page(s):
2451-2462
In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.
Multiple Sound Source Localization Based on Inter-Channel Correlation Using a Distributed Microphone System in a Real Environment
Kook CHO Hajime OKUMURA Takanobu NISHIURA Yoichi YAMASHITA

PAPER-Microphone Array

Vol:
E93-D No:9
Page(s):
2463-2471
In real environments, the presence of ambient noise and room reverberations seriously degrades the accuracy in sound source localization. In addition, conventional sound source localization methods cannot localize multiple sound sources accurately in real noisy environments. This paper proposes a new method of multiple sound source localization using a distributed microphone system that is a recording system with multiple microphones dispersed to a wide area. The proposed method localizes a sound source by finding the position that maximizes the accumulated correlation coefficient between multiple channel pairs. After the estimation of the first sound source, a typical pattern of the accumulated correlation for a single sound source is subtracted from the observed distribution of the accumulated correlation. Subsequently, the second sound source is searched again. To evaluate the effectiveness of the proposed method, experiments of two sound source localization were carried out in an office room. The result shows that sound source localization accuracy is about 99.7%. The proposed method could realize the multiple sound source localization robustly and stably.
Speech Enhancement Using a Square Microphone Array in the Presence of Directional and Diffuse Noise
Tetsuji OGAWA Shintaro TAKADA Kenzo AKAGIRI Tetsunori KOBAYASHI

PAPER-Speech and Hearing

Vol:
E93-A No:5
Page(s):
926-935
We propose a new speech enhancement method suitable for mobile devices used in the presence of various types of noise. In order to achieve high-performance speech recognition and auditory perception in mobile devices, various types of noise have to be removed under the constraints of a space-saving microphone arrangement and few computational resources. The proposed method can reduce both the directional noise and the diffuse noise under the abovementioned constraints for mobile devices by employing a square microphone array and conducting low-computational-cost processing that consists of multiple null beamforming, minimum power channel selection, and Wiener filtering. The effectiveness of the proposed method is experimentally verified in terms of speech recognition accuracy and speech quality when both the directional noise and the diffuse noise are observed simultaneously; this method reduces the number of word errors and improves the log-spectral distances as compared to conventional methods.
Probabilistic Adaptation Mode Control Algorithm for GSC-Based Noise Reduction
Seungho HAN Jungpyo HONG Sangbae JEONG Minsoo HAHN

LETTER-Speech and Hearing

Vol:
E93-A No:3
Page(s):
627-630
An efficient noise reduction algorithm is proposed to improve speech recognition performance for human machine interfaces. In the algorithm, a probabilistic adaptation mode controller (AMC) is designed and adopted to the generalized sidelobe canceller (GSC). To detect target speech intervals, the proposed AMC calculates the inter-channel correlation and estimates speech absence probability (SAP). Based on the SAP, the adaptation mode of the adaptive filter in the GSC is decided. Experimental results show the proposed algorithm significantly improves speech recognition performances and signal-to-noise ratios in real noisy environments.
A Single-Chip Speech Dialogue Module and Its Evaluation on a Personal Robot, PaPeRo-Mini
Miki SATO Toru IWASAWA Akihiko SUGIYAMA Toshihiro NISHIZAWA Yosuke TAKANO

PAPER-Digital Signal Processing

Vol:
E93-A No:1
Page(s):
261-271
This paper presents a single-chip speech dialogue module and its evaluation on a personal robot. This module is implemented on an application processor that was developed primarily for mobile phones to provide a compact size, low power-consumption, and low cost. It performs speech recognition with preprocessing functions such as direction-of-arrival (DOA) estimation, noise cancellation, beamforming with an array of microphones, and echo cancellation. Text-to-speech (TTS) conversion is also equipped with. Evaluation results obtained on a new personal robot, PaPeRo-mini, which is a scale-down version of PaPeRo, demonstrate an 85% correct rate in DOA estimation, and as much as 54% and 30% higher speech recognition rates in noisy environments and during robot utterances, respectively. These results are shown to be comparable to those obtained by PaPeRo.
Robust Relative Transfer Function Estimation for Dual Microphone-Based Generalized Sidelobe Canceller
Kihyeon KIM Hanseok KO

LETTER-Speech and Hearing

Vol:
E92-D No:9
Page(s):
1794-1797
In this Letter, a robust system identification method is proposed for the generalized sidelobe canceller using dual microphones. The conventional transfer-function generalized sidelobe canceller employs the non-stationarity characteristics of the speech signal to estimate the relative transfer function and thus is difficult to apply when the noise is also non-stationary. Under the assumption of W-disjoint orthogonality between the speech and the non-stationary noise, the proposed algorithm finds the speech-dominant time-frequency bins of the input signal by inspecting the system output and the inter-microphone time delay. Only these bins are used to estimate the relative transfer function, so reliable estimates can be obtained under non-stationary noise conditions. The experimental results show that the proposed algorithm significantly improves the performance of the transfer-function generalized sidelobe canceller, while only sustaining a modest estimation error in adverse non-stationary noise environments.
Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems
Yasunari OBUCHI Nobuo HATAOKA

PAPER-Speech and Hearing

Vol:
E92-D No:4
Page(s):
662-670
In this paper we describe a new framework of feature combination in the cepstral domain for multi-input robust speech recognition. The general framework of working in the cepstral domain has various advantages over working in the time or hypothesis domain. It is stable, easy to maintain, and less expensive because it does not require precise calibration. It is also easy to configure in a complex speech recognition system. However, it is not straightforward to improve the recognition performance by increasing the number of inputs, and we introduce the concept of variance re-scaling to compensate the negative effect of averaging several input features. Finally, we propose to take another advantage of working in the cepstral domain. The speech can be modeled using hidden Markov models, and the model can be used as prior knowledge. This approach is formulated as a new algorithm, referred to as Hypothesis-Based Feature Combination. The effectiveness of various algorithms are evaluated using two sets of speech databases. We also refer to automatic optimization of some parameters in the proposed algorithms.
A Fully On-Chip Gm-Opamp-RC Based Preamplifier for Electret Condenser Microphones
Huy-Binh LE Seung-Tak RYU Sang-Gug LEE

LETTER-Electronic Circuits

Vol:
E92-C No:4
Page(s):
587-588
An on-chip CMOS preamplifier for direct signal readout from an electret capacitor microphone has been designed with high immunity to common-mode and supply noise. The Gm-Opamp-RC based high impedance preamplifier helps to remove all disadvantages of the conventional JFET based amplifier and can drive a following switched-capacitor sigma-delta modulator in order to realize a compact digital electret microphone. The proposed chip is designed based on 0.18 µm CMOS technology, and the simulation results show 86 dB of dynamic range with 4.5 µVrms of input-referred noise for an audio bandwidth of 20 kHz and a total harmonic distortion (THD) of 1% at 90 mVrms input. Power supply rejection ratio (PSRR) and common-mode rejection ration (CMRR) are more than 95 dB at 1 kHz. The proposed design dissipates 125 µA and can operate over a wide supply voltage range of 1.6 V to 3.3 V.
A Robust Sound Source Localization Approach for Microphone Array with Model Errors
Hua XIAO Huai-Zong SHAO Qi-Cong PENG

PAPER-Speech and Hearing

Vol:
E91-A No:8
Page(s):
2062-2067
In this paper, a robust sound source localization approach is proposed. The approach retains good performance even when model errors exist. Compared with previous work in this field, the contributions of this paper are as follows. First, an improved broad-band and near-field array model is proposed. It takes array gain, phase perturbations into account and is based on the actual positions of the elements. It can be used in arbitrary planar geometry arrays. Second, a subspace model errors estimation algorithm and a Weighted 2-Dimension Multiple Signal Classification (W2D-MUSIC) algorithm are proposed. The subspace model errors estimation algorithm estimates unknown parameters of the array model, i.e., gain, phase perturbations, and positions of the elements, with high accuracy. The performance of this algorithm is improved with the increasing of SNR or number of snapshots. The W2D-MUSIC algorithm based on the improved array model is implemented to locate sound sources. These two algorithms compose the robust sound source approach. The more accurate steering vectors can be provided for further processing such as adaptive beamforming algorithm. Numerical examples confirm effectiveness of this proposed approach.
Enhancement of Sound Sources Located within a Particular Area Using a Pair of Small Microphone Arrays
Yusuke HIOKA Kazunori KOBAYASHI Ken'ichi FURUYA Akitoshi KATAOKA

PAPER-Engineering Acoustics

Vol:
E91-A No:2
Page(s):
561-574
A method for extracting a sound signal from a particular area that is surrounded by multiple ambient noise sources is proposed. This method performs several fixed beamformings on a pair of small microphone arrays separated from each other to estimate the signal and noise power spectra. Noise suppression is achieved by applying spectrum emphasis to the output of fixed beamforming in the frequency domain, which is derived from the estimated power spectra. In experiments performed in a room with reverberation, this method succeeded in suppressing the ambient noise, giving an SNR improvement of more than 10 dB, which is better than the performance of the conventional fixed and adaptive beamforming methods using a large-aperture microphone array. We also confirmed that this method keeps its performance even if the noise source location changes continuously or abruptly.
An Approach to Solve Local Minimum Problem in Sound Source and Microphone Localization
Kazunori KOBAYASHI Ken'ichi FURUYA Yoichi HANEDA Akitoshi KATAOKA

PAPER-Engineering Acoustics

Vol:
E90-A No:12
Page(s):
2826-2834
We previously proposed a method of sound source and microphone localization. The method estimates the locations of sound sources and microphones from only time differences of arrival between signals picked up by microphones even if all their locations are unknown. However, there is a problem that some estimation results converge to local minimum solutions because this method estimates locations iteratively and the error function has multiple minima. In this paper, we present a new iterative method to solve the local minimum problem. This method achieves accurate estimation by selecting effective initial locations from many random initial locations. The computer simulation and experimental results demonstrate that the presented method eliminates most local minimum solutions. Furthermore, the computational complexity of the presented method is similar to that of the previous method.
Text-Independent Speaker Identification in a Distant-Talking Multi-Microphone Environment
Mikyong JI Sungtak KIM Hoirin KIM

LETTER-Speech and Hearing

Vol:
E90-D No:11
Page(s):
1892-1895
With the aim of improving speaker identification, we propose a likelihood-based integration method to combine the speaker identification results obtained through multiple microphones. In many cases, the composite result has lower error rate than that by any single channel. The proposed integration method can achieve more reliable identification performance in the ubiquitous robot companion (URC) environment in which the robot is connected to a server through an extremely high broadband penetration rate.
Environmentally Robust Electret Condenser Microphone
Yoshinobu YASUNO Yasuhiro RIKO Nobuhiro FUNAKOSHI Takeshi SHIMIZU Goro YAMAUCHI

LETTER-Engineering Acoustics

Vol:
E89-A No:8
Page(s):
2226-2229
We introduced a new electret condenser Microphone (ECM) water repellent coating structure for protection against common hazards, such as water or alcohol. This protection structure is composed of small acoustical holes with a water-repellent coating. The water-repellent coating has a contact angle of more than 150 degrees for water on a small acoustical hole having less than 0.2 mm aperture, which blocks water invasion but allows acoustical transmission. The reliability of the coating was confirmed by several tests, such as long-term immersion in water and alcohol, re-flow soldering test and surface scratching. These tests produced no damage to the coating. The fabricated ECM meets the requirements for the IEC 60526 class 7, which is 30 minutes under water at a depth of 1 meter. The diameter and number of holes is determined both by acoustic characteristics and water resistance.
Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation
Yuki DENDA Takanobu NISHIURA Yoichi YAMASHITA

PAPER-Speech Enhancement

Vol:
E89-D No:3
Page(s):
1050-1057
This paper describes a new talker direction estimation method for front-end processing to capture distant-talking speech by using a microphone array. The proposed method consists of two algorithms: One is a TDOA (Time Delay Of Arrival) estimation algorithm based on a weighted CSP (Cross-power Spectrum Phase) analysis with an average speech spectrum and CSP coefficient subtraction. The other is a talker direction estimation algorithm based on ML (Maximum Likelihood) estimation in a time sequence of the estimated TDOAs. To evaluate the effectiveness of the proposed method, talker direction estimation experiments were carried out in an actual office room. The results confirmed that the talker direction estimation performance of the proposed method is superior to that of the conventional methods in both diffused- and directional-noise environments.
Robust Beamforming of Microphone Array Using H_∞ Adaptive Filtering Technique
Jwu-Sheng HU Wei-Han LIU Chieh-Cheng CHENG

PAPER-Speech/Audio Processing

Vol:
E89-A No:3
Page(s):
708-715
In ASR (Automatic Speech Recognition) applications, one of the most important issues in the real-time beamforming of microphone arrays is the inability to capture the whole acoustic dynamics via a finite-length of data and a finite number of array elements. For example, the reflected source signal impinging from the side-lobe direction presents a coherent interference, and the non-minimal phase channel dynamics may require an infinite amount of data in order to achieve perfect equalization (or inversion). All these factors appear as uncertainties or un-modeled dynamics in the receiving signals. Traditional adaptive algorithms such as NLMS that do not consider these errors will result in performance deterioration. In this paper, a time domain beamformer using H∞ filtering approach is proposed to adjust the beamforming parameters. Furthermore, this work also proposes a frequency domain approach called SPFDBB (Soft Penalty Frequency Domain Block Beamformer) using H∞ filtering approach that can reduce computational efforts and provide a purified data to the ASR application. Experimental results show that the adaptive H∞ filtering method is robust to the modeling errors and suppresses much more noise interference than that in the NLMS based method. Consequently, the correct rate of ASR is also enhanced.
Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition
Jwu-Sheng HU Chieh-Cheng CHENG

PAPER-Noise and Vibration

Vol:
E88-A No:9
Page(s):
2401-2411
This investigation proposed two array beamformers SPFDBB (Soft Penalty Frequency Domain Block Beamformer) and FDABB (Frequency Domain Adjustable Block Beamformer). Compared with the conventional beamformers, these frequency-domain methods can significantly reduce the computation power requirement in ASR (Automatic Speech Recognition) based applications. Like other reference signal based techniques, SPFDBB and FDABB minimize microphone's mismatch, desired signal cancellation caused by reflection effects and resolution due to the array's position. Additionally, these proposed methods are suitable for both near-field and far-field environments. Generally, the convolution relation between channel and speech source in time domain cannot be modeled accurately as a multiplication in the frequency domain with a finite window size, especially in ASR applications. SPFDBB and FDABB can approximate this multiplication by treating several frames as a block to achieve a better beamforming result. Moreover, FDABB adjusts the number of frames on-line to cope with the variation of characteristics in both speech and interference signals. A better performance was found to be achievable by combining these methods with an ASR mechanism.
Near-Field Sound-Source Localization Based on a Signed Binary Code
Miki SATO Akihiko SUGIYAMA Osamu HOSHUYAMA Nobuyuki YAMASHITA Yoshihiro FUJITA

PAPER-Digital Signal Processing

Vol:
E88-A No:8
Page(s):
2078-2086
This paper proposes near-field sound-source localization based on crosscorrelation of a signed binary code. The signed binary code eliminates multibit signal processing for simpler implementation. Explicit formulae with near-field assumption are derived for a two microphone scenario and extended to a three microphone case with front-rear discrimination. Adaptive threshold for enabling and disabling source localization is developed for robustness in noisy environment. The proposed sound-source localization algorithm is implemented on a fixed-point DSP. Evaluation results in a robot scenario demonstrate that near-field assumption and front-rear discrimination provides almost 40% improvement in DOA estimation. A correct detection rate of 85% is obtained by a robot in a home environment.
Multiple Signal Classification by Aggregated Microphones
Mitsuharu MATSUMOTO Shuji HASHIMOTO

PAPER-Microphone Array

Vol:
E88-A No:7
Page(s):
1701-1707
This paper introduces the multiple signal classification (MUSIC) method that utilizes the transfer characteristics of microphones located at the same place, namely aggregated microphones. The conventional microphone array realizes a sound localization system according to the differences in the arrival time, phase shift, and the level of the sound wave among each microphone. Therefore, it is difficult to miniaturize the microphone array. The objective of our research is to build a reliable miniaturized sound localization system using aggregated microphones. In this paper, we describe a sound system with N microphones. We then show that the microphone array system and the proposed aggregated microphone system can be described in the same framework. We apply the multiple signal classification to the method that utilizes the transfer characteristics of the microphones placed at a same location and compare the proposed method with the microphone array. In the proposed method, all microphones are placed at the same place. Hence, it is easy to miniaturize the system. This feature is considered to be useful for practical applications. The experimental results obtained in an ordinary room are shown to verify the validity of the measurement.
Robust Subspace Analysis and Its Application in Microphone Array for Speech Enhancement
Zhu Liang YU Meng Hwa ER

PAPER-Microphone Array

Vol:
E88-A No:7
Page(s):
1708-1715
A robust microphone array for speech enhancement and noise suppression is studied in this paper. To overcome target signal cancellation problem of conventional beamformer caused by array imperfections or reverberation effects of acoustic enclosure, the proposed microphone array adopts an arbitrary model of channel transfer function (TF) relating microphone and speech source. Since the estimation of channel TF itself is often intractable, herein, transfer function ratio (TFR) is estimated instead and used to form a suboptimal beamformer. A robust TFR estimation method is proposed based on signal subspace analysis technique against stationary or slowly varying noise. Experiments using simulated signal and actual signal recorded in a real room illustrate that the proposed method has high performance in adverse environment.
Separation of Sound Sources Propagated in the Same Direction
Akio ANDO Masakazu IWAKI Kazuho ONO Koichi KUROZUMI

PAPER-Blind Source Separation

Vol:
E88-A No:7
Page(s):
1665-1672
This paper describes a method for separating a target sound from other noise arriving in a single direction when the target cannot, therefore, be separated by directivity control. Microphones are arranged in a line toward the sources to form null sensitivity points at given distances from the microphones. The null points exclude non-target sound sources on the basis of weighting coefficients for microphone outputs determined by blind source separation. The separation problem is thereby simplified to instantaneous separation by adjustment of the time-delays for microphone outputs. The system uses a direct (i.e. non-iterative) algorithm for blind separation based on second-order statistics, assuming that all sources are non-stationary signals. Simulations show that the 2-microphone system can separate a target sound with separability of more than 40 dB for the 2-source problem, and 25 dB for the 3-source problem when the other sources are adjacent.

21-40hit(72hit)

Keyword Search Result

[Keyword] microphone(72hit)

Distant Speech Recognition Using a Microphone Array Network

Multiple Sound Source Localization Based on Inter-Channel Correlation Using a Distributed Microphone System in a Real Environment

Speech Enhancement Using a Square Microphone Array in the Presence of Directional and Diffuse Noise

Probabilistic Adaptation Mode Control Algorithm for GSC-Based Noise Reduction

A Single-Chip Speech Dialogue Module and Its Evaluation on a Personal Robot, PaPeRo-Mini

Robust Relative Transfer Function Estimation for Dual Microphone-Based Generalized Sidelobe Canceller

Multi-Input Feature Combination in the Cepstral Domain for Practical Speech Recognition Systems

A Fully On-Chip Gm-Opamp-RC Based Preamplifier for Electret Condenser Microphones

A Robust Sound Source Localization Approach for Microphone Array with Model Errors

Enhancement of Sound Sources Located within a Particular Area Using a Pair of Small Microphone Arrays

An Approach to Solve Local Minimum Problem in Sound Source and Microphone Localization

Text-Independent Speaker Identification in a Distant-Talking Multi-Microphone Environment

Environmentally Robust Electret Condenser Microphone

Robust Talker Direction Estimation Based on Weighted CSP Analysis and Maximum Likelihood Estimation

Robust Beamforming of Microphone Array Using H_∞ Adaptive Filtering Technique

Frequency Domain Microphone Array Calibration and Beamforming for Automatic Speech Recognition

Near-Field Sound-Source Localization Based on a Signed Binary Code

Multiple Signal Classification by Aggregated Microphones

Robust Subspace Analysis and Its Application in Microphone Array for Speech Enhancement

Separation of Sound Sources Propagated in the Same Direction

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles