This paper proposes a new pitch prediction method for 4 kbps CELP (Code Excited LPC) speech coding with 20 msec frame, for the future ITU-T 4 kbps speech coding standardization. In the conventional CELP speech coding, synthetic speech quality deteriorates rapidly at 4 kbps, especially for female and children's speech with short pitch period. The pitch prediction performance is significantly degraded for such speech. The important reason is that when the pitch period is shorter than the subframe length, the simple repetition of the past excitation signal based on the estimated lag, not the pitch prediction, is usually carried out in the adaptive codebook operation. The proposed pitch prediction method can carry out the pitch prediction without the above approximation by utilizing the current subframe excitation codevector signal, when the pitch prediction parameters are determined. To further improve the performance, a split vector synthesis and perceptually spectral weighting method, and a low-complexity perceptually harmonic and spectral weighting method have also been developed. The informal listening test result shows that the 4 kbps speech coder with 20 msec frame, utilizing all of the proposed improvements, achieves 0.2 MOS higher results than the coder without them.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Masahiro SERIZAWA, Kazunori OZAWA, "4 kbps Improved Pitch Prediction CELP Speech Coding with 20 msec Frame" in IEICE TRANSACTIONS on Information,
vol. E78-D, no. 6, pp. 758-763, June 1995, doi: .
Abstract: This paper proposes a new pitch prediction method for 4 kbps CELP (Code Excited LPC) speech coding with 20 msec frame, for the future ITU-T 4 kbps speech coding standardization. In the conventional CELP speech coding, synthetic speech quality deteriorates rapidly at 4 kbps, especially for female and children's speech with short pitch period. The pitch prediction performance is significantly degraded for such speech. The important reason is that when the pitch period is shorter than the subframe length, the simple repetition of the past excitation signal based on the estimated lag, not the pitch prediction, is usually carried out in the adaptive codebook operation. The proposed pitch prediction method can carry out the pitch prediction without the above approximation by utilizing the current subframe excitation codevector signal, when the pitch prediction parameters are determined. To further improve the performance, a split vector synthesis and perceptually spectral weighting method, and a low-complexity perceptually harmonic and spectral weighting method have also been developed. The informal listening test result shows that the 4 kbps speech coder with 20 msec frame, utilizing all of the proposed improvements, achieves 0.2 MOS higher results than the coder without them.
URL: https://global.ieice.org/en_transactions/information/10.1587/e78-d_6_758/_p
Copy
@ARTICLE{e78-d_6_758,
author={Masahiro SERIZAWA, Kazunori OZAWA, },
journal={IEICE TRANSACTIONS on Information},
title={4 kbps Improved Pitch Prediction CELP Speech Coding with 20 msec Frame},
year={1995},
volume={E78-D},
number={6},
pages={758-763},
abstract={This paper proposes a new pitch prediction method for 4 kbps CELP (Code Excited LPC) speech coding with 20 msec frame, for the future ITU-T 4 kbps speech coding standardization. In the conventional CELP speech coding, synthetic speech quality deteriorates rapidly at 4 kbps, especially for female and children's speech with short pitch period. The pitch prediction performance is significantly degraded for such speech. The important reason is that when the pitch period is shorter than the subframe length, the simple repetition of the past excitation signal based on the estimated lag, not the pitch prediction, is usually carried out in the adaptive codebook operation. The proposed pitch prediction method can carry out the pitch prediction without the above approximation by utilizing the current subframe excitation codevector signal, when the pitch prediction parameters are determined. To further improve the performance, a split vector synthesis and perceptually spectral weighting method, and a low-complexity perceptually harmonic and spectral weighting method have also been developed. The informal listening test result shows that the 4 kbps speech coder with 20 msec frame, utilizing all of the proposed improvements, achieves 0.2 MOS higher results than the coder without them.},
keywords={},
doi={},
ISSN={},
month={June},}
Copy
TY - JOUR
TI - 4 kbps Improved Pitch Prediction CELP Speech Coding with 20 msec Frame
T2 - IEICE TRANSACTIONS on Information
SP - 758
EP - 763
AU - Masahiro SERIZAWA
AU - Kazunori OZAWA
PY - 1995
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E78-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 1995
AB - This paper proposes a new pitch prediction method for 4 kbps CELP (Code Excited LPC) speech coding with 20 msec frame, for the future ITU-T 4 kbps speech coding standardization. In the conventional CELP speech coding, synthetic speech quality deteriorates rapidly at 4 kbps, especially for female and children's speech with short pitch period. The pitch prediction performance is significantly degraded for such speech. The important reason is that when the pitch period is shorter than the subframe length, the simple repetition of the past excitation signal based on the estimated lag, not the pitch prediction, is usually carried out in the adaptive codebook operation. The proposed pitch prediction method can carry out the pitch prediction without the above approximation by utilizing the current subframe excitation codevector signal, when the pitch prediction parameters are determined. To further improve the performance, a split vector synthesis and perceptually spectral weighting method, and a low-complexity perceptually harmonic and spectral weighting method have also been developed. The informal listening test result shows that the 4 kbps speech coder with 20 msec frame, utilizing all of the proposed improvements, achieves 0.2 MOS higher results than the coder without them.
ER -