IEICE global.ieice.org Site

Author Search Result

[Author] Akitoshi KATAOKA(10hit)

1-10hit

A G.711 Embedded Wideband Speech Coding for VoIP Conferences
Yusuke HIWASAKI Hitoshi OHMURO Takeshi MORI Sachiko KURIHARA Akitoshi KATAOKA

PAPER-Speech and Hearing

Vol:
E89-D No:9
Page(s):
2542-2552
This paper proposes a wideband speech coder in which a G.711 bitstream is embedded. This coder has an advantage over conventional coders in that it has a high interoperability with existing terminals so costly transcoding involving decoding and re-encoding can be avoided. We also propose a partial mixing method that effectively reduces the mixing complexity in multiple-point remote conferences. To reduce the complexity, we take advantage of the scalable structure of the bitstream and mix only the lower band of the signal. For the higher band, the main speaker location is selected among remote locations and is redistributed with the mixed lower-band signal. By subjective evaluations, we show that the speech quality can be maintained even when the speech signals are partially mixed.
Improving Power Spectra Estimation in 2-Dimensional Areas Using Number of Active Sound Sources
Yusuke HIOKA Ken'ichi FURUYA Yoichi HANEDA Akitoshi KATAOKA

PAPER-Engineering Acoustics

Vol:
E94-A No:1
Page(s):
273-281
An improvement of estimating sound power spectra located in a particular 2-dimensional area is proposed. We previously proposed a conventional method that estimates sound power spectra using multiple fixed beamformings in order to emphasize speech located in a particular 2-dimensional area. However, the method has one drawback that the number of areas where the active sound sources are located must be restricted. This restriction makes the method less effective when many noise source located in different areas are simultaneously active. In this paper, we reveal the cause of this restriction and determine the maximum number of areas for which the method is able to simultaneously estimate sound power spectra. Then we also introduce a procedure for investigating areas that include active sound sources to reduce the number of unknown power spectra to be estimated. The effectiveness of the proposed method is examined by experimental evaluation applied to sounds recorded in a practical environment.
Enhancement of Sound Sources Located within a Particular Area Using a Pair of Small Microphone Arrays
Yusuke HIOKA Kazunori KOBAYASHI Ken'ichi FURUYA Akitoshi KATAOKA

PAPER-Engineering Acoustics

Vol:
E91-A No:2
Page(s):
561-574
A method for extracting a sound signal from a particular area that is surrounded by multiple ambient noise sources is proposed. This method performs several fixed beamformings on a pair of small microphone arrays separated from each other to estimate the signal and noise power spectra. Noise suppression is achieved by applying spectrum emphasis to the output of fixed beamforming in the frequency domain, which is derived from the estimated power spectra. In experiments performed in a room with reverberation, this method succeeded in suppressing the ambient noise, giving an SNR improvement of more than 10 dB, which is better than the performance of the conventional fixed and adaptive beamforming methods using a large-aperture microphone array. We also confirmed that this method keeps its performance even if the noise source location changes continuously or abruptly.
Improved CELP-Based Coding in a Noisy Environment Using a Trained Sparse Conjugate Codebook
Akitoshi KATAOKA Sachiko KURIHARA Shinji HAYASHI Takehiro MORIYA

PAPER-Speech Processing and Acoustics

Vol:
E79-D No:2
Page(s):
123-129
A trained sparse conjugate codebook is proposed for improving the speech quality of CELP-based coding in a noisy environment. Although CELP coding provides high quality at a low bit rate in a silent environment (creating clean speech), it cannot provide a satisfactory quality in a noisy environment because the conventional fixed codebook is designed to be suitable for clean speech. The proposed codebook consists of two sub-codebooks; each sub-codebook consists of a random component and a trained component. Each component has excitation vectors consisting of a few pulses. In the random component, pulse position and amplitude are determined randomly. Since the radom component does not depend on the speech characteristics, it handles noise better than the trained one. The trained component maintains high quality for clean speech. Since excitation vector is the sum of the two sub-excitation vectors, this codebook handles various speech conditions by selecting a sub-vector from each component. This codebook also reduces the computational complexity of a fixed codebook search and memory requirements compared with the conventional codebook. Subjective testing (absolute category rating (ACR) and degradation category rating (DCR)) indicated that this codebook improves speech quality compared with the conventional trained codebook for noisy speech. The ACR test showed that the quality of the 8 kbit/s CELP coder with this codebook is equivalent to that of the 32 kbit/s ADPCM for clean speech.
A 6.4-kbit/s Variable-Bit-Rate Extension to the G.729 (CS-ACELP) Speech Coder
Akitoshi KATAOKA Sachiko KURIHARA Shinji HAYASHI

PAPER-Speech Processing and Acoustics

Vol:
E80-D No:12
Page(s):
1183-1189
This paper proposes a 6.4-kbit/s extension to G.729 (conjugate structure algebraic code excited linear prediction: CS-ACELP). Each G.729 module was investigated to determine which bits could be removed without hurting the speech quality, then two coders that have different bit allocations were designed. They have two different algebraic codebooks (a 10-bit algebraic codebook that has two pulses and an 11-bit algebraic codebook that has two or three pulses). This paper also proposes a conditional orthogonalized search for a fixed codebook to improve the speech quality. The conditional orthogonalized search chooses, one of two search methods (orthogonalized or non-orthogonalized) based on the optimum pitch gain. The quality of the two coders was evaluated using objective measurements (SNR and segmental SNR) and subjective ones (mean opinion score: MOS and a pair-comparison test). The selected coder was evaluated under practical conditions. Subjective test results have indicated that the quality of the proposed coder (10-ms frame length) is equivalent to that of the 6.3-kbit/s G.723.1 coder, which has a 30-ms frame length.
Measuring the Perceived Importance of Speech Segments for Transmission over IP Networks Open Access
Yusuke HIWASAKI Toru MORINAGA Jotaro IKEDO Akitoshi KATAOKA

PAPER

Vol:
E89-B No:2
Page(s):
326-333
This paper presents a way of using a linear regression model to produce a single-valued criterion that indicates the perceived importance of each block in a stream of speech blocks. This method is superior to the conventional approach, voice activity detection (VAD), in that it provides a dynamically changing priority value for speech segments with finer granularity. The approach can be used in conjunction with scalable speech coding techniques in the context of IP QoS services to achieve a flexible form of quality control for speech transmission. A simple linear regression model is used to estimate a mean opinion score (MOS) of the various cases of missing speech segments. The estimated MOS is a continuous value that can be mapped to priority levels with arbitrary granularity. Through subjective evaluation, we show the validity of the calculated priority values.
Gradient-Limited Affine Projection Algorithm for Double-Talk-Robust and Fast-Converging Acoustic Echo Cancellation
Suehiro SHIMAUCHI Yoichi HANEDA Akitoshi KATAOKA Akinori NISHIHARA

PAPER-Engineering Acoustics

Vol:
E90-A No:3
Page(s):
633-641
We propose a gradient-limited affine projection algorithm (GL-APA), which can achieve fast and double-talk-robust convergence in acoustic echo cancellation. GL-APA is derived from the M-estimation-based nonlinear cost function extended for evaluating multiple error signals dealt with in the affine projection algorithm (APA). By considering the nonlinearity of the gradient, we carefully formulate an update equation consistent with multiple input-output relationships, which the conventional APA inherently satisfies to achieve fast convergence. We also newly introduce a scaling rule for the nonlinearity, so we can easily implement GL-APA by using a predetermined primary function as a basis of scaling with any projection order. This guarantees a linkage between GL-APA and the gradient-limited normalized least-mean-squares algorithm (GL-NLMS), which is a conventional algorithm that corresponds to the GL-APA of the first order. The performance of GL-APA is demonstrated with simulation results.
An Approach to Solve Local Minimum Problem in Sound Source and Microphone Localization
Kazunori KOBAYASHI Ken'ichi FURUYA Yoichi HANEDA Akitoshi KATAOKA

PAPER-Engineering Acoustics

Vol:
E90-A No:12
Page(s):
2826-2834
We previously proposed a method of sound source and microphone localization. The method estimates the locations of sound sources and microphones from only time differences of arrival between signals picked up by microphones even if all their locations are unknown. However, there is a problem that some estimation results converge to local minimum solutions because this method estimates locations iteratively and the error function has multiple minima. In this paper, we present a new iterative method to solve the local minimum problem. This method achieves accurate estimation by selecting effective initial locations from many random initial locations. The computer simulation and experimental results demonstrate that the presented method eliminates most local minimum solutions. Furthermore, the computational complexity of the presented method is similar to that of the previous method.
Robust Frequency Domain Acoustic Echo Cancellation Filter Employing Normalized Residual Echo Enhancement
Suehiro SHIMAUCHI Yoichi HANEDA Akitoshi KATAOKA

PAPER

Vol:
E91-A No:6
Page(s):
1347-1356
We propose a new robust frequency domain acoustic echo cancellation filter that employs a normalized residual echo enhancement. By interpreting the conventional robust step-size control approaches as a statistical-model-based residual echo enhancement problem, the optimal step-size introduced in the most of conventional approaches is regarded as optimal only on the assumption that both the residual echo and the outlier in the error output signal are described by Gaussian distributions. However, the Gaussian-Gaussian mixture assumption does not always hold well, especially when both the residual echo and the outlier are speech signals (known as a double-talk situation). The proposed filtering scheme is based on the Gaussian-Laplacian mixture assumption for the signals normalized by the reference input signal amplitude. By comparing the performances of the proposed and conventional approaches through the simulations, we show that the Gaussian-Laplacian mixture assumption for the normalized signals can provide a better control scheme for the acoustic echo cancellation.
A Low Complexity Speech Codec and Its Error Protection
Jotaro IKEDO Akitoshi KATAOKA

PAPER-Source Encoding

Vol:
E80-B No:11
Page(s):
1688-1695
This paper proposes a new speech codec based on CELP for PHS multimedia communication. PHS portable terminals should consume as little power as possible, and the codec used in them has to be robust against channel errors. Therefore, the proposed codec operates with low computational complexity while reducing the deterioration in speech quality due to channel errors. This codec uses two new schemes to reduce computational complexity. One is moving average scalar quantization for the filter coefficients of the synthesis filter. This scheme requires 90% less complexity to quantize synthesis filter coefficients compared to the widely used vector quantization. The other is pre-selection for selecting an algebraic codebook used as random excitation source. An orthogonalization scheme is used for stable pre-selection. Deterioration of speech quality is suppressed by using CRC and parameter estimation for error protection. Two types of codec are proposed: a 10-ms frame type that transmits 160 bits every 10-ms and a 15-ms frame type that transmits 160 bits every 15 ms. The computational complexity of these codecs is less than 5 MOPS. In a nochannel error environment, the speech quality is equal to that of ITU-TG.726 at 32.0 kbit/s. With 0.3% channel error, both codecs offer more comfortable conversation than G.726. Moreover, at 1.0% channel error, the 10-ms frame type still provides comfortable conversation.

Author Search Result

[Author] Akitoshi KATAOKA(10hit)

A G.711 Embedded Wideband Speech Coding for VoIP Conferences

Improving Power Spectra Estimation in 2-Dimensional Areas Using Number of Active Sound Sources

Enhancement of Sound Sources Located within a Particular Area Using a Pair of Small Microphone Arrays

Improved CELP-Based Coding in a Noisy Environment Using a Trained Sparse Conjugate Codebook

A 6.4-kbit/s Variable-Bit-Rate Extension to the G.729 (CS-ACELP) Speech Coder

Measuring the Perceived Importance of Speech Segments for Transmission over IP Networks Open Access

Gradient-Limited Affine Projection Algorithm for Double-Talk-Robust and Fast-Converging Acoustic Echo Cancellation

An Approach to Solve Local Minimum Problem in Sound Source and Microphone Localization

Robust Frequency Domain Acoustic Echo Cancellation Filter Employing Normalized Residual Echo Enhancement

A Low Complexity Speech Codec and Its Error Protection

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles