The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] CELP coding(6hit)

1-6hit
  • 4-kbit/s Multi-Dispersed-Pulse-Based CELP (MDP-CELP) Speech Coder

    Hiroyuki EHARA  Koji YOSHIDA  Kazutoshi YASUNAGA  Toshiyuki MORII  

     
    PAPER-Speech and Hearing

      Vol:
    E85-D No:2
      Page(s):
    392-401

    This paper presents a high quality 4-kbit/s speech coding algorithm based on a CELP algorithm. The coder operates on speech frames of 20 ms. The algorithm has following four main features: multiple sub-codebooks, backward adaptive mode switching, dispersed-pulse structure, and noise post-processing. The multiple sub-codebooks consist of a pulse-codebook and a random-codebook so that they can handle both signals, noise-like (e.g. unvoiced, stationary noise) and pulse-like (e.g. voiced). The backward adaptive mode switching is performed using decoded parameters; therefore, no additional mode bit is transmitted. The random-codebook size is switched with the backward adaptively selected mode. The subjective quality of unvoiced speech or noise-like signal can be improved by this switching operation because the random-codebook size is greatly increased in such signal mode. The dispersed-pulse structure provides better performance of sparse pulse excitation using dispersed pulses instead of simple unit pulses. The noise post-processing employs a stationary background noise generator for producing stationary noise signal. It significantly improves subjective quality of decoded signal under various background noise conditions. Subjective listening tests are conducted in accordance with ACR and DCR tests. The ACR test results indicate that the fundamental performance of the MDP-CELP is equivalent to that of 32-kbit/s adaptive differential pulse code modulation (ADPCM). The DCR test results show that the performance of the MDP-CELP is equivalent to or better than that of 8-kbit/s conjugate-structure algebraic code excited linear prediction (CS-ACELP) under several background noise conditions.

  • A 16 kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis

    Kazuhito KOISHIDA  Gou HIRABAYASHI  Keiichi TOKUDA  Takao KOBAYASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E83-D No:4
      Page(s):
    876-883

    We propose a wideband CELP-type speech coder at 16 kb/s based on a mel-generalized cepstral (MGC) analysis technique. MGC analysis makes it possible to obtain a more accurate representation of spectral zeros compared to linear predictive (LP) analysis and take a perceptual frequency scale into account. A major advantage of the proposed coder is that the benefits of MGC representation of speech spectra can be incorporated into the CELP coding process. Subjective tests show that the proposed coder at 16 kb/s achieves a significant improvement in performance over a 16 kb/s conventional CELP coder under the same coding framework and bit allocation. Moreover, the proposed coder is found to outperform the ITU-T G. 722 standard at 64 kb/s.

  • A 6.4-kbit/s Variable-Bit-Rate Extension to the G.729 (CS-ACELP) Speech Coder

    Akitoshi KATAOKA  Sachiko KURIHARA  Shinji HAYASHI  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E80-D No:12
      Page(s):
    1183-1189

    This paper proposes a 6.4-kbit/s extension to G.729 (conjugate structure algebraic code excited linear prediction: CS-ACELP). Each G.729 module was investigated to determine which bits could be removed without hurting the speech quality, then two coders that have different bit allocations were designed. They have two different algebraic codebooks (a 10-bit algebraic codebook that has two pulses and an 11-bit algebraic codebook that has two or three pulses). This paper also proposes a conditional orthogonalized search for a fixed codebook to improve the speech quality. The conditional orthogonalized search chooses, one of two search methods (orthogonalized or non-orthogonalized) based on the optimum pitch gain. The quality of the two coders was evaluated using objective measurements (SNR and segmental SNR) and subjective ones (mean opinion score: MOS and a pair-comparison test). The selected coder was evaluated under practical conditions. Subjective test results have indicated that the quality of the proposed coder (10-ms frame length) is equivalent to that of the 6.3-kbit/s G.723.1 coder, which has a 30-ms frame length.

  • Improved CELP-Based Coding in a Noisy Environment Using a Trained Sparse Conjugate Codebook

    Akitoshi KATAOKA  Sachiko KURIHARA  Shinji HAYASHI  Takehiro MORIYA  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E79-D No:2
      Page(s):
    123-129

    A trained sparse conjugate codebook is proposed for improving the speech quality of CELP-based coding in a noisy environment. Although CELP coding provides high quality at a low bit rate in a silent environment (creating clean speech), it cannot provide a satisfactory quality in a noisy environment because the conventional fixed codebook is designed to be suitable for clean speech. The proposed codebook consists of two sub-codebooks; each sub-codebook consists of a random component and a trained component. Each component has excitation vectors consisting of a few pulses. In the random component, pulse position and amplitude are determined randomly. Since the radom component does not depend on the speech characteristics, it handles noise better than the trained one. The trained component maintains high quality for clean speech. Since excitation vector is the sum of the two sub-excitation vectors, this codebook handles various speech conditions by selecting a sub-vector from each component. This codebook also reduces the computational complexity of a fixed codebook search and memory requirements compared with the conventional codebook. Subjective testing (absolute category rating (ACR) and degradation category rating (DCR)) indicated that this codebook improves speech quality compared with the conventional trained codebook for noisy speech. The ACR test showed that the quality of the 8 kbit/s CELP coder with this codebook is equivalent to that of the 32 kbit/s ADPCM for clean speech.

  • 8-kb/s Low-Delay Speech Coding with 4-ms Frame Size

    Yoshiaki ASAKAWA  Preeti RAO  Hidetoshi SEKINE  

     
    PAPER

      Vol:
    E78-A No:8
      Page(s):
    927-933

    This paper describes modifications to a previously proposed 8-kb/s 4-ms-delay CELP speech coding algorithm with a view to improving the speech quality while maintaining low delay and only moderately increasing complexity. The modifications are intended to improve the effectiveness of interframe pitch lag prediction and the sub-optimality level of the excitation coding to the backward adapted synthesis filter by using delayed decision and joint optimization techniques. Results of subjective listening tests using Japanese speech indicate that the coded speech quality is significantly superior to that of the 8-kb/s VSELP coder which has a 20-ms delay. A method that reduces the computational complexity of closed-loop 3-tap pitch prediction with no perceptible degradation in speech quality is proposed, based on representing the pitch-tap vector as the product of a scalar pitch gain and a normalized shape codevector.

  • Characteristics of Multi-Layer Perceptron Models in Enhancing Degraded Speech

    Thanh Tung LE  John MASON  Tadashi KITAMURA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    744-750

    A multi-layer perceptron (MLP) acting directly in the time-domain is applied as a speech signal enhancer, and the performance examined in the context of three common classes of degradation, namely low bit-rate CELP degradation is non-linear system degradation, additive noise, and convolution by a linear system. The investigation focuses on two topics: (i) the influence of non-linearities within the network and (ii) network topology, comparing single and multiple output structures. The objective is to examine how these characteristics influence network performance and whether this depends on the class of degradation. Experimental results show the importance of matching the enhancer to the class of degradation. In the case of the CELP coder the standard MLP with its inherently non-linear characteristics is shown to be consistently better than any equivalent linear structure (up to 3.2 dB compared with 1.6 dB SNR improvement). In contrast, when the degradation is from additive noise, a linear enhancer is always, superior.