IEICE global.ieice.org Site

Keyword Search Result

[Keyword] wideband speech(8hit)

1-8hit

Phonetically Balanced Text Corpus Design Using a Similarity Measure for a Stereo Super-Wideband Speech Database
Yoo Rhee OH Yong Guk KIM Mina KIM Hong Kook KIM Mi Suk LEE Hyun Joo BAE

PAPER-Speech and Hearing

Vol:
E94-D No:7
Page(s):
1459-1466
In this paper, we propose a text corpus design method for a Korean stereo super-wideband speech database. Since a small-sized text corpus for speech coding is generally required for speech coding, the corpus should be designed to comply with the pronunciation behavior of natural conversation in order to ensure efficient speech quality tests. To this end, the proposed design method utilizes a similarity measure between the phoneme distribution occurring from natural conversation and that from the designed text corpus. In order to achieve this goal, we first collect and refine text data from textbooks and websites. Next, a corpus is designed from the refined text data based on the similarity measure to compare phoneme distributions. We then construct a Korean stereo super-wideband speech (K-SW) database using the designed text corpus, where the recording environment is set to meet the conditions defined by ITU-T. Finally, the subjective quality of the K-SW database is evaluated using an ITU-T super-wideband codec in order to demonstrate that the K-SW database is useful for developing and evaluating super-wideband codecs.
Low-Complexity Wideband LSF Quantization Using Algebraic Trellis VQ
Abdellah KADDAI Mohammed HALIMI

PAPER-Speech and Hearing

Vol:
E92-D No:12
Page(s):
2478-2486
In this paper an algebraic trellis vector quantization (ATVQ) that introduces algebraic codebooks into trellis coded vector quantization (TCVQ) structure is presented. Low encoding complexity and minimum memory storage requirements are achieved using the proposed approach. It exploits advantages of both the TCVQ and the algebraic codebooks to know the delayed decision, the codebook widening, the low computational complexity and the no storage of codebook. This novel vector quantization scheme is used to encode the wideband speech line spectral frequencies (LSF) parameters. Experimental results on wideband speech have shown that ATVQ yields the same performance as the traditional split vector quantization (SVQ) and the TCVQ in terms of spectral distortion (SD). It can achieve a transparent quality at 47 bits/frame with a considerable reduction of memory storage and computation complexity when compared to SVQ and TCVQ.
A G.711 Embedded Wideband Speech Coding for VoIP Conferences
Yusuke HIWASAKI Hitoshi OHMURO Takeshi MORI Sachiko KURIHARA Akitoshi KATAOKA

PAPER-Speech and Hearing

Vol:
E89-D No:9
Page(s):
2542-2552
This paper proposes a wideband speech coder in which a G.711 bitstream is embedded. This coder has an advantage over conventional coders in that it has a high interoperability with existing terminals so costly transcoding involving decoding and re-encoding can be avoided. We also propose a partial mixing method that effectively reduces the mixing complexity in multiple-point remote conferences. To reduce the complexity, we take advantage of the scalable structure of the bitstream and mix only the lower band of the signal. For the higher band, the main speaker location is selected among remote locations and is redistributed with the mixed lower-band signal. By subjective evaluations, we show that the speech quality can be maintained even when the speech signals are partially mixed.
Multimedia Quality Prediction Methodologies for Advanced Mobile and IP-Based Telephony Open Access
Nobuhiko KITAWAKI

INVITED PAPER

Vol:
E89-B No:2
Page(s):
262-272
This paper describes the author's perspective on multimedia quality prediction methodologies for multimedia communications in advanced mobile and internet protocol (IP)-based telephony, and reports related experiments and trials. First, the paper describes the need for perceptual QoS (Quality of Service) assessment in which various quality factors in multimedia communications for advanced mobile and IP-based telephony are analyzed. Then an objective quality prediction scheme is proposed from the viewpoints of quality measurement tools for each quality factor and an opinion model for compound quality factors in mobile and IP-based communications networks. Finally, the author's current trials of measurement tools and opinion models are described.
Multiband Vector Quantization Based on Inner Product for Wideband Speech Coding
Joon-Hyuk CHANG Sanjit K. MITRA

LETTER-Speech and Hearing

Vol:
E88-D No:11
Page(s):
2606-2608
This paper describes a multiband vector quantization (VQ) technique based on inner product for wideband speech coding at 16 kb/s. Our approach consists of splitting the input speech into two separate bands and then applying an independent coding scheme for each band. A code excited linear prediction (CELP) coder is used in the lower band while a transform based coding strategy is applied in the higher band. The spectral components in the higher frequency band are represented by a set of modulated lapped transform (MLT) coefficients. The higher frequency band is divided into three subbands, and the MLT coefficients construct a vector for each subband. Specifically, for the VQ of these vectors, an inner product-based distance measure is proposed as a new strategy. The proposed 16 kb/s coder with the inner-product based distortion measure achieves better performance than the 48 kb/s ITU-T G.722 in subjective quality tests.
Speech Quality Enhancement Using Wavelet Reconstruction Filters
Seiji HAYASHI Masahiro SUGUIMOTO

LETTER-Speech and Hearing

Vol:
E88-D No:6
Page(s):
1299-1303
The present paper describes a quality enhancement of band-limited speech signals. In regular telephone communication, the quality of the received speech signal is degraded by band limitation. We propose an effective but simple scheme for obtaining narrowband speech signals in which the frequency components are estimated from band limited signals. The proposed method utilizes aliasing components generated by wavelet reconstruction filters in the inverse discrete wavelet transform. The results of enhancement have been verified by applying this method to speech samples via telephone lines to obtain a noticeable improvement in speech quality.
Objective Quality Assessment of Wideband Speech Coding
Nobuhiko KITAWAKI Kou NAGAI Takeshi YAMADA

PAPER-Network

Vol:
E88-B No:3
Page(s):
1111-1118
Recently, wideband speech communication using 7 kHz-wideband speech coding, as described in ITU-T Recommendations G.722, G.722.1, and G.722.2, has become increasingly necessary for use in advanced IP telephony using PCs, since, for this application, hands-free communication using separate microphones and loudspeakers is indispensable, and in this situation wideband speech is particularly helpful in enhancing the naturalness of communication. An objective quality measurement methodology for wideband-speech coding has been studied, its essential components being an objective quality measure and an input test signal. This paper describes Wideband-PESQ conforming to the draft Annex to ITU-T Recommendation P.862, "Perceptual Evaluation of Speech Quality (PESQ)," as the objective quality measure, by evaluating the consistency between the subjectively evaluated MOS (Mean Opinion Score) and objectively estimated MOS. This paper also describes the verification of artificial voice conforming to Recommendation P.50 "Artificial Voices," as the input test signal for such measurements, by evaluating the consistency between the objectively estimated MOS using a real voice and that obtained using an artificial voice.
A 16 kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis
Kazuhito KOISHIDA Gou HIRABAYASHI Keiichi TOKUDA Takao KOBAYASHI

PAPER-Speech and Hearing

Vol:
E83-D No:4
Page(s):
876-883
We propose a wideband CELP-type speech coder at 16 kb/s based on a mel-generalized cepstral (MGC) analysis technique. MGC analysis makes it possible to obtain a more accurate representation of spectral zeros compared to linear predictive (LP) analysis and take a perceptual frequency scale into account. A major advantage of the proposed coder is that the benefits of MGC representation of speech spectra can be incorporated into the CELP coding process. Subjective tests show that the proposed coder at 16 kb/s achieves a significant improvement in performance over a 16 kb/s conventional CELP coder under the same coding framework and bit allocation. Moreover, the proposed coder is found to outperform the ITU-T G. 722 standard at 64 kb/s.