1-5hit |
Masahiro FUKUI Shigeaki SASAKI Yusuke HIWASAKI Kimitaka TSUTSUMI Sachiko KURIHARA Hitoshi OHMURO Yoichi HANEDA
We proposes a new adaptive spectral masking method of algebraic vector quantization (AVQ) for non-sparse signals in the modified discreet cosine transform (MDCT) domain. This paper also proposes switching the adaptive spectral masking on and off depending on whether or not the target signal is non-sparse. The switching decision is based on the results of MDCT-domain sparseness analysis. When the target signal is categorized as non-sparse, the masking level of the target MDCT coefficients is adaptively controlled using spectral envelope information. The performance of the proposed method, as a part of ITU-T G.711.1 Annex D, is evaluated in comparison with conventional AVQ. Subjective listening test results showed that the proposed method improves sound quality by more than 0.1 points on a five-point scale on average for speech, music, and mixed content, which indicates significant improvement.
Hae-Yong YANG Kyung-Hoon LEE Sung-Jea KO
We present an improvement to the existing steganography-based bandwidth extension scheme. Enhanced WB (wideband) speech quality is achieved by embedding multiple highband spectral gains into a G.711 bitstream. The number of spectral gains is selected by optimizing the quantity of the embedding data with respect to the quality of the extended WB speech. Compared to the existing method, the proposed scheme improves the WB PESQ (Perceptual Evaluation of Speech Quality) score by 0.334 with negligible degradation of the embedded narrowband speech.
Akinori ITO Shun'ichiro ABE Yoiti SUZUKI
In this paper, we propose a novel data hiding technique for G.711-coded speech based on the LSB substitution method. The novel feature of the proposed method is that a low-bitrate encoder, G.726 ADPCM, is used as a reference for deciding how many bits can be embedded in a sample. Experiments showed that the method outperformed the simple LSB substitution method and the selective embedding method proposed by Aoki. We achieved 4-kbit/s embedding with almost no subjective degradation of speech quality, and 10 kbit/s while maintaining good quality.
This study proposes a technique of lossless steganography for G.711, the most common codec for digital speech communications systems such as VoIP. The proposed technique exploits the characteristics of G.711 for embedding steganogram information without degradation. This paper shows the capacity of the proposed technique.
Yusuke HIWASAKI Hitoshi OHMURO Takeshi MORI Sachiko KURIHARA Akitoshi KATAOKA
This paper proposes a wideband speech coder in which a G.711 bitstream is embedded. This coder has an advantage over conventional coders in that it has a high interoperability with existing terminals so costly transcoding involving decoding and re-encoding can be avoided. We also propose a partial mixing method that effectively reduces the mixing complexity in multiple-point remote conferences. To reduce the complexity, we take advantage of the scalable structure of the bitstream and mix only the lower band of the signal. For the higher band, the main speaker location is selected among remote locations and is redistributed with the mixed lower-band signal. By subjective evaluations, we show that the speech quality can be maintained even when the speech signals are partially mixed.