1-2hit |
Shuuichi ARAI Arata MIYAUCHI Shinji OZAWA
In general, the analysis-synthesis systems are constructed on a linear frequency scale. On the other hand, the frequency resolution of human hearing system have non-linear characteristics. So, it is interesting to study about the analysis-synthesis system on such a non-linear frequency scale like MEL scale. And it is well known that LSP analysis-synthesis method is superior to LPC or PARCOR method in frame rate and quantization characteristics. In this paper, we describe an LSP analysis-synthesis system on MEL frequency scale. At first, we propose the way to obtain LSP parameters on Mel frequency scale (Mel LSP parameters) from the speech signal in linear time domain. Next we propose how to construct the analysis and synthesis filters in linear time domain using the MEL LSP parameters. Furthermore, we combine this system with the ordinary LSP analysis-synthesis system to improve the quality of the synthetic speech. We carried out some experiments to make clear the characteristics of the combined system. The results of tests show that the quality of synthetic speech with the combined system is higher than that with the ordinary LSP system and that with the MEL LSP system on condition that total prediction order is 10. Through the further experiments, we confirm that the synthetic speech quality with the combined system is as good as the that with the standard LSP system at prediction order 12.
The information compression by LSP analysis-synthesis is a promising method in the sense that the speech with relatively high quality can be synthesized with small amount of codes. Up to present, coding of LSP parameters have been investigated from various viewpoint. It rarely happens that the speech information changes rapidly with time. And, there exists a correlation between two time-series of LSP which are adjacent on the frequency-axis. We have already proposed the coding method which is considered these two features of LSP parameters. However, there still exists redundancy which is lying on the sorrounding LSP parameters. This paper proposes a new LSP coding method which employs the fuzzy reasoning. Using the fuzzy reasoning, it becomes possible that the useful information lying on the surrounding LSP parameters influence the new coding method. Applying the proposed method, the coding experiment and the quality evaluation of the synthetic speeches were performed by comparing our previous method on condition that analysis order is 10. Experiments confirmed two principal results. First, number of transmitting bits which is required to suppress the spectral envelope distortion to 1 dB or less, decreases to 20.8 to 34.7 bit per frame. Second, this coding method needs about 2.8 kbit/sec for transmitting the vocal tract information to suppress the spectral envelope distortion including the time distortion to 1 dB.