1-3hit |
S. A. Asghar BEHESHTI SHIRAZI Yoshitaka MORIKAWA Hiroshi HAMADA
This work addresses the problems of bit allocation and coding gain in subband coding system with non-paraunitary filter banks. Since energy conservation does not hold in non-paraunitary filter banks, the model to be adopted for quantizers is important to evaluate the output distortion introduced by subband signal quantization. To evaluate the overall distortion we start with adopting the gain plus additive noise model for quantizers, which is more reliable than the additive noise model. With this model, the expression for overall reconstruction error variance becomes so complicated that the problem of optimum bit allocation, as required for evaluation of the coding gain, must be numerically solved. So, we propose an approximation method in which we neglect the terms due to correlation among quantization errors in calculating the bit allocation but take them into consideration in evaluating the coding gain, assuming sufficiently high bitrate coding. Application of this approximation method to the SSKF subband coding systems with AR (1) input source shows that the method is very accurate even at low bit rate coding (1 bit/sample).
Hiroshi HAMADA Satoshi MIKI Ryohei NAKATSU
A new method is proposed for automatically evaluating the English pronunciation quality of non-native speakers. It is assumed that pronunciation can be rated using three criteria: the static characteristics of phonetic spectra, the dynamic structure of spectrum sequences, and the prosodic characteristics of utterances. The evaluation uses speech recognition techniques to compare the English words pronounced by a non-native speaker with those pronounced by a native speaker. Three evaluation measures are proposed to rate pronunciation quality. (1) The standard deviation of the mapping vectors, which map the codebook vectors of the non-native speaker onto the vector space of the native speaker, is used to evaluate the static phonetic spectra characteristics. (2) The spectral distance between words pronounced by the non-native speaker and those pronounced by the native speaker obtained by the DTW method is used to evaluate the dynamic characteristics of spectral sequences. (3) The differences in fundamental frequency and speech power between the pronunciation of the native and non-native speaker are used as the criteria for evaluating prosodic characteristics. Evaluation experiments are carried out using 441 words spoken by 10 Japanese speakers and 10 native speakers. One half of the 441 words was used to evaluate static phonetic spectra characteristics, and the other half was used to evaluate the dynamic characteristics of spectral sequences, as well as the prosodic characteristics. Based on the experimental results, the correlation between the evaluation scores and the scores determined by human judgement is found to be 0.90.
S. A. Asghar BEHESHTI SHIRAZI Yoshitaka MORIKAWA Hiroshi HAMADA
This paper deals with the improvement of performance in the transform and subband image coding systems with negatively-correlated input signal. Using a more general source model than the AR(1) model as an input, the coding performance for the transform and subband coding schemes is evaluated in terms of the coding gain over PCM. The source model used here has such resonant band characteristics that its power spectrum has a peak at some frequency between 0 and π/2 for positive autocorrelation and between π/2 and π for negative autocorrelation. It is shown that coding schemes are classified into two classes; one has the pairwise mirror-image property in their filter banks and performs symmetrically regardless of the sign of the autocorrelation, and the other has no that property and performs asymmetrically with inferior performance for negative autocorrelation. Among the well-known transform and subband coding schemes, the DHT and QMF coding systems belong to the former class and the DCT and SSKF coding systems to the latter. In order to remedy the inferior performance, we propose the method in which one modulates the negatively-correlated signal sequences by the alternating sign signal with unity magnitude (-1)n to convert them into positively-correlated sequences. The algorithms are presented for the DCT and SSKF image coding systems with the adaptive signal modulation. In the DCT coding systems, we are particularly concerned with the DCT-based hierarchical progressive coding mode of operation, since the signal modulation works well for that coding mode. The SSKF image coding system has the regular quad-tree structure with three stages. The simulation results for test images show that our method can successfully be applied to the images with a considerable amount of energy in the frequency range higher than π/2 in horizontal or vertical direction, such as fingerprints and textile patterns sampled at a rate close to the Nyquist rate. The paper closes with a brief introduction to the modification of our DCT-based method.