1-2hit |
Pusadee SERESANGTAKUL Tomio TAKARA
We have developed Thai speech synthesis by rule using cepstral parameters. In order to synthesize the pitch contour of Thai tones, we have applied an extension of Fujisaki's model. A mid tone is unique for Thai when compared to Chinese. For the extension of Fujisaki's model to Thai tones, we assumed that the mid tone is neutral and we adopted its phrase component as the phrase components for all tones. According to our study on the pitch contour of five Thai tones using this model, the result shows that the command pattern for the local F0 components needs both positive and negative commands. Listening tests showed that the intelligibility of the Thai tones measured in terms of error rate were 0.0%, 0.7% and 2.7% for analysis/synthesis, Fujisaki's model and the polynomial model, respectively. Therefore, it is shown that the extension of Fujisaki's model is effective for Thai.
This paper reports on a new application of the Markov model to an automatic speech recognition system, in which the feature vectors of speech are regarded to represent the states and the output symbols of the Markov model. The transition-probability of the states and the symbol-output probability are assumed to be represented by multidimensional normal density functions of the feature vector. The DP-matching algorithm is used for calculating optimum time sequence of observed feature vectors. In order to confirm the efficiency of this system, we compared experimentally performance of this system to that of other approaches, such as those using Maharanobis' distance or Euclidean distance. Based on experimentation, in a speaker independent mode, using a vocabulary of Japanese single-digit and four-digit numerals, the current system is shown to be more effective than others.