1-4hit |
Takehiro MORIYA Satoshi MIKI Kazunori MANO Hitoshi OHMURO
A speech coding scheme at 3.6 kbit/s has been proposed. The scheme is based on CELP (Code Excited Linear Prediction) with pitch synchronous innovation, which means even random codevectors as well as adaptive codevectors have pitch periodicity. The quality is comparable to 6.7 kbit/s VSELP coder for the Japanese cellular radio standard.
Masahiro FUKUI Shigeaki SASAKI Yusuke HIWASAKI Kimitaka TSUTSUMI Sachiko KURIHARA Hitoshi OHMURO Yoichi HANEDA
We proposes a new adaptive spectral masking method of algebraic vector quantization (AVQ) for non-sparse signals in the modified discreet cosine transform (MDCT) domain. This paper also proposes switching the adaptive spectral masking on and off depending on whether or not the target signal is non-sparse. The switching decision is based on the results of MDCT-domain sparseness analysis. When the target signal is categorized as non-sparse, the masking level of the target MDCT coefficients is adaptively controlled using spectral envelope information. The performance of the proposed method, as a part of ITU-T G.711.1 Annex D, is evaluated in comparison with conventional AVQ. Subjective listening test results showed that the proposed method improves sound quality by more than 0.1 points on a five-point scale on average for speech, music, and mixed content, which indicates significant improvement.
Hitoshi OHMURO Takehiro MORIYA Kazunori MANO Satoshi MIKI
This letter proposes an LSP quantizing method which uses interframe correlation of the parameters. The quantized parameters are represented as a moving average of code vectors. Using this method, LSP parameters are quantized efficiently and the degradation of decoded parameters caused by bit errors affects only a few following frames.
Yusuke HIWASAKI Hitoshi OHMURO Takeshi MORI Sachiko KURIHARA Akitoshi KATAOKA
This paper proposes a wideband speech coder in which a G.711 bitstream is embedded. This coder has an advantage over conventional coders in that it has a high interoperability with existing terminals so costly transcoding involving decoding and re-encoding can be avoided. We also propose a partial mixing method that effectively reduces the mixing complexity in multiple-point remote conferences. To reduce the complexity, we take advantage of the scalable structure of the bitstream and mix only the lower band of the signal. For the higher band, the main speaker location is selected among remote locations and is redistributed with the mixed lower-band signal. By subjective evaluations, we show that the speech quality can be maintained even when the speech signals are partially mixed.