1-3hit |
Keiichiro OURA Heiga ZEN Yoshihiko NANKAKU Akinobu LEE Keiichi TOKUDA
A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.
Takatoshi JITSUHIRO Tomoko MATSUI Satoshi NAKAMURA
We propose a new method to introduce the Minimum Description Length (MDL) criterion to the automatic generation of non-uniform, context-dependent HMM topologies. Phonetic decision tree clustering is widely used, based on the Maximum Likelihood (ML) criterion, and only creates contextual variations. However, the ML criterion needs to predetermine control parameters, such as the total number of states, empirically for use as stop criteria. Information criteria have been applied to solve this problem for decision tree clustering. However, decision tree clustering cannot create topologies with various state lengths automatically. Therefore, we propose a method that applies the MDL criterion as split and stop criteria to the Successive State Splitting (SSS) algorithm as a means of generating contextual and temporal variations. This proposed method, the MDL-SSS algorithm, can automatically create adequate topologies without such predetermined parameters. Experimental results for travel arrangement dialogs and lecture speech show that the MDL-SSS can automatically stop splitting and obtain more appropriate HMM topologies than the original one.
Hideaki TSUCHIYA Shuichi ITOH Takeshi HASHIMOTO
A algorithm for designing a pattern classifier, which uses MDL criterion and a binary data structure, is proposed. The algorithm gives a partitioning of the range of the multi-dimensional attribute and gives an estimated probability model for this partitioning. The volume of bins in this partitioning is upper bounded by ο((log N/N)K/(K+2)) almost surely, where N is the length of training sequence and K is the dimension of the attribute. The convergence rates of the code length and the divergence of the estimated model are asymptotically upper bounded by ο((log N/N)2/(K+2)). The classification error is asymptotically upper bounded by ο((log N/N)1/(K+2)). Simulation results for 1-dimensional and 2-dimensional attribute cases show that the algorithm is practically efficient.