The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] audio coding(10hit)

1-10hit
  • Reversible Audio Data Hiding Based on Variable Error-Expansion of Linear Prediction for Segmental Audio and G.711 Speech

    Akira NISHIMURA  

     
    PAPER

      Pubricized:
    2015/10/21
      Vol:
    E99-D No:1
      Page(s):
    83-91

    Reversible data hiding is a technique in which hidden data are embedded in host data such that the consistency of the host is perfectly preserved and its data are restored during extraction of the hidden data. In this paper, a linear prediction technique for reversible data hiding of audio waveforms is improved. The proposed variable expansion method is able to control the payload size through varying the expansion factor. The proposed technique is combined with the prediction error expansion method. Reversible embedding, perfect payload detection, and perfect recovery of the host signal are achieved for a framed audio signal. A smaller expansion factor results in a smaller payload size and less degradation in the stego audio quality. Computer simulations reveal that embedding a random-bit payload of less than 0.4 bits per sample into CD-format music signals provide stego audio with acceptable objective quality. The method is also applied to G.711 µ-law-coded speech signals. Computer simulations reveal that embedding a random-bit payload of less than 0.1 bits per sample into speech signals provide stego speech with good objective quality.

  • Adaptive Spectral Masking of AVQ Coding and Sparseness Detection for ITU-T G.711.1 Annex D and G.722 Annex B Standards

    Masahiro FUKUI  Shigeaki SASAKI  Yusuke HIWASAKI  Kimitaka TSUTSUMI  Sachiko KURIHARA  Hitoshi OHMURO  Yoichi HANEDA  

     
    PAPER-Speech and Hearing

      Vol:
    E97-D No:5
      Page(s):
    1264-1272

    We proposes a new adaptive spectral masking method of algebraic vector quantization (AVQ) for non-sparse signals in the modified discreet cosine transform (MDCT) domain. This paper also proposes switching the adaptive spectral masking on and off depending on whether or not the target signal is non-sparse. The switching decision is based on the results of MDCT-domain sparseness analysis. When the target signal is categorized as non-sparse, the masking level of the target MDCT coefficients is adaptively controlled using spectral envelope information. The performance of the proposed method, as a part of ITU-T G.711.1 Annex D, is evaluated in comparison with conventional AVQ. Subjective listening test results showed that the proposed method improves sound quality by more than 0.1 points on a five-point scale on average for speech, music, and mixed content, which indicates significant improvement.

  • LP/WLP Hybrid Scheme for Quality Improvement of TCX Coders Operating at Low Bit Rates

    Tung-chin LEE  Young-cheol PARK  Dae-hee YOUN  

     
    LETTER-Speech and Hearing

      Vol:
    E95-D No:7
      Page(s):
    2017-2020

    In this paper, we propose a switchable linear prediction (LP)/warped linear prediction (WLP) hybrid scheme for the transform coded excitation (TCX) coder, which is adopted as a core codec in AMR-WB+ and USAC. The proposed algorithm selects either an LP or WLP filter on a per-frame basis. To provide a smooth transitions between LP and WLP frames, a window switching scheme is developed using sine and rectangular windows. In addition, a Gaussian Mixture Model (GMM)-based classification module is used to determine the prediction mode. Through a subjective listening test it was confirmed that the proposed LP/WLP switching scheme offers improved sound quality.

  • MPEG-2/4 Low-Complexity Advanced Audio Coding Optimization and Implementation on DSP

    Bing-Fei WU  Hao-Yu HUANG  Yen-Lin CHEN  Hsin-Yuan PENG  Jia-Hsiung HUANG  

     
    PAPER-Speech and Hearing

      Vol:
    E93-D No:5
      Page(s):
    1225-1237

    This study presents several optimization approaches for the MPEG-2/4 Audio Advanced Coding (AAC) Low Complexity (LC) encoding and decoding processes. Considering the power consumption and the peripherals required for consumer electronics, this study adopts the TI OMAP5912 platform for portable devices. An important optimization issue for implementing AAC codec on embedded and mobile devices is to reduce computational complexity and memory consumption. Due to power saving issues, most embedded and mobile systems can only provide very limited computational power and memory resources for the coding process. As a result, modifying and simplifying only one or two blocks is insufficient for optimizing the AAC encoder and enabling it to work well on embedded systems. It is therefore necessary to enhance the computational efficiency of other important modules in the encoding algorithm. This study focuses on optimizing the Temporal Noise Shaping (TNS), Mid/Side (M/S) Stereo, Modified Discrete Cosine Transform (MDCT) and Inverse Quantization (IQ) modules in the encoder and decoder. Furthermore, we also propose an efficient memory reduction approach that provides a satisfactory balance between the reduction of memory usage and the expansion of the encoded files. In the proposed design, both the AAC encoder and decoder are built with fixed-point arithmetic operations and implemented on a DSP processor combined with an ARM-core for peripheral controlling. Experimental results demonstrate that the proposed AAC codec is computationally effective, has low memory consumption, and is suitable for low-cost embedded and mobile applications.

  • Bandwidth-Scalable Stereo Audio Coding Based on a Layered Structure

    Young Han LEE  Deok Su KIM  Hong Kook KIM  Jongmo SUNG  Mi Suk LEE  Hyun Joo BAE  

     
    LETTER-Speech and Hearing

      Vol:
    E92-D No:12
      Page(s):
    2540-2544

    In this paper, we propose a bandwidth-scalable stereo audio coding method based on a layered structure. The proposed stereo coding method encodes super-wideband (SWB) stereo signals and is able to decode either wideband (WB) stereo signals or SWB stereo signals, depending on the network congestion. The performance of the proposed stereo coding method is then compared with that of a conventional stereo coding method that separately decodes WB or SWB stereo signals, in terms of subjective quality, algorithmic delay, and computational complexity. Experimental results show that when stereo audio signals sampled at a rate of 32 kHz are compressed to 64 kbit/s, the proposed method provides significantly better audio quality with a 64-sample shorter algorithmic delay, and comparable computational complexity.

  • Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    Jong Kyu KIM  Nam Soo KIM  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:6
      Page(s):
    1830-1833

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  • Frame Splitting Scheme for Error-Robust Audio Streaming over Packet-Switching Networks

    Jong Kyu KIM  Jung Su KIM  Hwan Sik YUN  Joon-Hyuk CHANG  Nam Soo KIM  

     
    LETTER-Multimedia Systems for Communications

      Vol:
    E91-B No:2
      Page(s):
    677-680

    This letter presents a novel frame splitting scheme for an error-robust audio streaming over packet-switching networks. In our approach to perceptual audio coding, an audio frame is split into several subframes based on the network configuration such that each packet can be decoded independently at the receiver. Through a subjective comparison category rating (CCR) test, it is discovered that our approach enhances the quality of the decoded audio signal under the lossy packet-switching networks environment.

  • Bandwidth Extension with Hybrid Signal Extrapolation for Audio Coding

    Chatree BUDSABATHON  Akinori NISHIHARA  

     
    PAPER

      Vol:
    E90-A No:8
      Page(s):
    1564-1569

    In this paper, we propose a blind method using hybrid signal extrapolation at the decoder to regenerate lost high-frequency components which are removed by encoders. At first, a decoded signal spectral resolution is enhanced by time domain linear predictive extrapolation and then the cut off frequency of each frame is estimated to avoid the spectrum gap between the end of original low frequency spectrum and the beginning of reconstructed high frequency spectrum. By utilizing a correlation between the high frequency spectrum and low frequency spectrum, the low frequency spectrum component is employed to reconstruct the high frequency spectrum component by frequency domain linear predictive extrapolation. Experimental results show an effective improvement of the proposed method in terms of SNR and human listening test results. The proposed method can be used to reconstruct the lost high frequency component to improve the perceptual quality of audio independent of the compression method.

  • Lossless Scalable Audio Coding and Quality Enhancement

    Takehiro MORIYA  Akio JIN  Takeshi MORI  Kazunaga IKEDA  Takao KANEKO  

     
    PAPER-Speech and Audio Coding

      Vol:
    E86-D No:3
      Page(s):
    425-429

    This paper proposes a lossless scalable audio coding scheme and quality enhancement processing at the decoder to compensate for some missing scalable units of information. The bit rate scalability is achieved by combining high-compression coding, such as MPEG-4, and horizontal bit slicing of the PCM-coded error signal between the original waveform and the locally reconstructed MPEG-4 signal. The horizontally sliced stream may be transported through an IP network with priority. Even if some units are missing at the decoder, reasonable quality waveform can be reconstructed by means of preserving the important packets. In addition, quality enhancement procedures including scale adjustment and post-processing have been proposed. The scale adjustment eliminates unnecessary zero's, and the post-processing recovers the spectral envelope characteristics of the original input signal. As a result of objective quality evaluation, the two techniques are confirmed to be useful for quality enhancement when lower priority packets are lost. This scheme enables graceful degradation by supporting lossless, near lossless, and high-compression coding within a single scalable framework, and is useful for narrowband to broadband audio streaming.

  • Perfect Reconstruction Conditions for Adaptive Blocksize MDCT

    Takashi MOCHIZUKI  

     
    PAPER-Digital Image Processing

      Vol:
    E77-A No:5
      Page(s):
    894-899

    This paper describes the general conditions for perfect signal reconstruction in adaptive blocksize MDCT. MDCT, or modified Discrete Cosine Transform, is a method in which blocks are laid to overlap each other. Because of block overlapping, some consideration must be paid to reconstructing the signals perfectly in adaptive blocksize schemes. The perfect reconstruction conditions are derived by considering the reconstruction signals, on a segment by segment basis. These conditions restrict the analysis/synthesis windows in the MDCT formula. Finally, this paper evaluates two examples of window sets, including windows used in the ISO MPEG audio coding standard.