IEICE global.ieice.org Site

Author Search Result

[Author] Kazuya TAKEDA(29hit)

21-29hit(29hit)

CIAIR In-Car Speech Corpus--Influence of Driving Status--
Nobuo KAWAGUCHI Shigeki MATSUBARA Kazuya TAKEDA Fumitada ITAKURA

LETTER

Vol:
E88-D No:3
Page(s):
578-582
CIAIR, Nagoya University, has been compiling an in-car speech database since 1999. This paper discusses the basic information contained in this database and an analysis on the effects of driving status based on the database. We have developed a system called the Data Collection Vehicle (DCV), which supports synchronous recording of multi-channel audio data from 12 microphones which can be placed throughout the vehicle, multi-channel video recording from three cameras, and the collection of vehicle-related data. In the compilation process, each subject had conversations with three types of dialog system: a human, a "Wizard of Oz" system, and a spoken dialog system. Vehicle information such as speed, engine RPM, accelerator/brake-pedal pressure, and steering-wheel motion were also recorded. In this paper, we report on the effect that driving status has on phenomena specific to spoken language
Investigation of DNN-Based Audio-Visual Speech Recognition
Satoshi TAMURA Hiroshi NINOMIYA Norihide KITAOKA Shin OSUGA Yurie IRIBE Kazuya TAKEDA Satoru HAYAMIZU

PAPER-Acoustic modeling

Pubricized:
2016/07/19
Vol:
E99-D No:10
Page(s):
2444-2451
Audio-Visual Speech Recognition (AVSR) is one of techniques to enhance robustness of speech recognizer in noisy or real environments. On the other hand, Deep Neural Networks (DNNs) have recently attracted a lot of attentions of researchers in the speech recognition field, because we can drastically improve recognition performance by using DNNs. There are two ways to employ DNN techniques for speech recognition: a hybrid approach and a tandem approach; in the hybrid approach an emission probability on each Hidden Markov Model (HMM) state is computed using a DNN, while in the tandem approach a DNN is composed into a feature extraction scheme. In this paper, we investigate and compare several DNN-based AVSR methods to mainly clarify how we should incorporate audio and visual modalities using DNNs. We carried out recognition experiments using a corpus CENSREC-1-AV, and we discuss the results to find out the best DNN-based AVSR modeling. Then it turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.
Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation
Tran HUY DAT Kazuya TAKEDA Fumitada ITAKURA

PAPER-Speech Enhancement

Vol:
E91-D No:3
Page(s):
439-447
We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.
Selective Listening Point Audio Based on Blind Signal Separation and Stereophonic Technology
Kenta NIWA Takanori NISHINO Kazuya TAKEDA

PAPER-Speech and Hearing

Vol:
E92-D No:3
Page(s):
469-476
A sound field reproduction method is proposed that uses blind source separation and a head-related transfer function. In the proposed system, multichannel acoustic signals captured at distant microphones are decomposed to a set of location/signal pairs of virtual sound sources based on frequency-domain independent component analysis. After estimating the locations and the signals of the virtual sources by convolving the controlled acoustic transfer functions with each signal, the spatial sound is constructed at the selected point. In experiments, a sound field made by six sound sources is captured using 48 distant microphones and decomposed into sets of virtual sound sources. Since subjective evaluation shows no significant difference between natural and reconstructed sound when six virtual sources and are used, the effectiveness of the decomposing algorithm as well as the virtual source representation are confirmed.
Noise Robust Speech Recognition Using Subband-Crosscorrelation Analysis
Shoji KAJITA Kazuya TAKEDA Fumitada ITAKURA

PAPER-Speech Processing and Acoustics

Vol:
E81-D No:10
Page(s):
1079-1086
This paper describes subband-crosscorrelation analysis (SBXCOR) using two input channel signals. SBXCOR is an extended signal processing technique of subband-autocorrelation analysis (SBCOR) that extracts periodicities associated with the inverse of center frequencies present in speech signals. In addition, to extract more periodicity information associated with the inverse of center frequencies, the multi-delay weighting (MDW) processing is applied to SBXCOR. In experiments, the noise robustness of SBXCOR is evaluated using a DTW word recognizer under (1) a simulated acoustic condition with white noise and (2) a real acoustic condition in a sound proof room with human speech-like noise. As the results, under the simulated acoustic condition, it is shown that SBXCOR is more robust than the conventional one-channel SBCOR, but less robust than SBCOR extracted from the two-channel-summed signal. Furthermore, by applying MDW processing, the performance of SBXCOR improved about 2% at SNR 0 dB. The resultant performance of SBXCOR with MDW processing was much better than those of smoothed group delay spectrum (SGDS) and mel-filterbank cepstral coefficient (MFCC) below SNR 10 dB. The results under the real acoustic condition were almost the same as the simulated acoustic condition.
Direction of Arrival Estimation Using Nonlinear Microphone Array
Hidekazu KAMIYANAGIDA Hiroshi SARUWATARI Kazuya TAKEDA Fumitada ITAKURA Kiyohiro SHIKANO

PAPER

Vol:
E84-A No:4
Page(s):
999-1010
This paper describes a new method for estimating the direction of arrival (DOA) using a nonlinear microphone array system based on complementary beamforming. Complementary beamforming is based on two types of beamformers designed to obtain complementary directivity patterns with respect to each other. In this system, since the resultant directivity pattern is proportional to the product of these directivity patterns, the proposed method can be used to estimate DOAs of 2(K-1) sound sources with K-element microphone array. First, DOA-estimation experiments are performed using both computer simulation and actual devices in real acoustic environments. The results clarify that DOA estimation for two sound sources can be accomplished by the proposed method with two microphones. Also, by comparing the resolutions of DOA estimation by the proposed method and by the conventional minimum variance method, we can show that the performance of the proposed method is superior to that of the minimum variance method under all reverberant conditions.
Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition
Weifeng LI Chiyomi MIYAJIMA Takanori NISHINO Katsunobu ITOU Kazuya TAKEDA Fumitada ITAKURA

PAPER-Speech Enhancement

Vol:
E88-A No:7
Page(s):
1716-1723
In this paper, we address issues in improving hands-free speech recognition performance in different car environments using multiple spatially distributed microphones. In the previous work, we proposed the multiple linear regression of the log spectra (MRLS) for estimating the log spectra of speech at a close-talking microphone. In this paper, the concept is extended to nonlinear regressions. Regressions in the cepstrum domain are also investigated. An effective algorithm is developed to adapt the regression weights automatically to different noise environments. Compared to the nearest distant microphone and adaptive beamformer (Generalized Sidelobe Canceller), the proposed adaptive nonlinear regression approach shows an advantage in the average relative word error rate (WER) reductions of 58.5% and 10.3%, respectively, for isolated word recognition under 15 real car environments.
A Single-Dimensional Interface for Arranging Multiple Audio Sources in Three-Dimensional Space
Kento OHTANI Kenta NIWA Kazuya TAKEDA

PAPER-Music Information Processing

Pubricized:
2017/06/26
Vol:
E100-D No:10
Page(s):
2635-2643
A single-dimensional interface which enables users to obtain diverse localizations of audio sources is proposed. In many conventional interfaces for arranging audio sources, there are multiple arrangement parameters, some of which allow users to control positions of audio sources. However, it is difficult for users who are unfamiliar with these systems to optimize the arrangement parameters since the number of possible settings is huge. We propose a simple, single-dimensional interface for adjusting arrangement parameters, allowing users to sample several diverse audio source arrangements and easily find their preferred auditory localizations. To select subsets of arrangement parameters from all of the possible choices, auditory-localization space vectors (ASVs) are defined to represent the auditory localization of each arrangement parameter. By selecting subsets of ASVs which are approximately orthogonal, we can choose arrangement parameters which will produce diverse auditory localizations. Experimental evaluations were conducted using music composed of three audio sources. Subjective evaluations confirmed that novice users can obtain diverse localizations using the proposed interface.
Blind Source Separation Using Dodecahedral Microphone Array under Reverberant Conditions
Motoki OGASAWARA Takanori NISHINO Kazuya TAKEDA

PAPER-Engineering Acoustics

Vol:
E94-A No:3
Page(s):
897-906
The separation and localization of sound source signals are important techniques for many applications, such as highly realistic communication and speech recognition systems. These systems are expected to work without such prior information as the number of sound sources and the environmental conditions. In this paper, we developed a dodecahedral microphone array and proposed a novel separation method with our developed device. This method refers to human sound localization cues and uses acoustical characteristics obtained by the shape of the dodecahedral microphone array. Moreover, this method includes an estimation method of the number of sound sources that can operate without prior information. The sound source separation performances were evaluated under simulated and actual reverberant conditions, and the results were compared with the conventional method. The experimental results showed that our separation performance outperformed the conventional method.

21-29hit(29hit)

Author Search Result

[Author] Kazuya TAKEDA(29hit)

CIAIR In-Car Speech Corpus--Influence of Driving Status--

Investigation of DNN-Based Audio-Visual Speech Recognition

Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

Selective Listening Point Audio Based on Blind Signal Separation and Stereophonic Technology

Noise Robust Speech Recognition Using Subband-Crosscorrelation Analysis

Direction of Arrival Estimation Using Nonlinear Microphone Array

Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition

A Single-Dimensional Interface for Arranging Multiple Audio Sources in Three-Dimensional Space

Blind Source Separation Using Dodecahedral Microphone Array under Reverberant Conditions

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles