The search functionality is under construction.

Author Search Result

[Author] Yong ZHAO(3hit)

1-3hit
  • Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units

    Lijuan WANG  Yong ZHAO  Min CHU  Frank K. SOONG  Jianlai ZHOU  Zhigang CAO  

     
    PAPER-Speech Synthesis

      Vol:
    E89-D No:3
      Page(s):
    1082-1091

    For producing high quality synthesis, a concatenation-based Text-to-Speech (TTS) system usually requires a large number of segmental units to cover various acoustic-phonetic contexts. However, careful manual labeling and segmentation by human experts, which is still the most reliable way to prepare such units, is labor intensive. In this paper we adopt a two-step procedure to automate the labeling, segmentation and refinement process. In the first step, coarse segmentation of speech data is performed by aligning speech signals with the corresponding sequence of Hidden Markov Models (HMMs). Then in the second step, segment boundaries are refined with a proposed Context-Dependent Boundary Model (CDBM). Classification and Regression Tree (CART) is adopted to organize available data into a structured hierarchical tree, where acoustically similar boundaries are clustered together to train tied CDBM models for boundary refinement. Optimal CDBM parameters and training conditions are found through a series of experimental studies. Comparing with manual segmentation reference, segmentation accuracy (within a tolerance of 20 ms) is improved by the CDBMs from 78.1% (baseline) to 94.8% in Mandarin Chinese and from 81.4% to 92.7% in English, with about 1,000 manually segmented sentences used in training the models. To further reduce the amount of manual data for training CDBMs of a new speaker, we adapt a well-trained CDBM via efficient adaptation algorithms. With only 10-20 manually segmented sentences as adaptation data, the adapted CDBM achieves a segmentation accuracy of 90%.

  • Multiple Gaussian Mixture Models for Image Registration

    Peng YE  Fang LIU  Zhiyong ZHAO  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E97-D No:7
      Page(s):
    1927-1929

    Gaussian mixture model (GMM) has recently been applied for image registration given its robustness and efficiency. However, in previous GMM methods, all the feature points are treated identically. By incorporating local class features, this letter proposes a multiple Gaussian mixture models (M-GMM) method for image registration. The proposed method can achieve higher accuracy results with less registration time. Experiments on real image pairs further proved the superiority of the proposed method.

  • Rectified Registration Consistency for Image Registration Evaluation

    Peng YE  Zhiyong ZHAO  Fang LIU  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E97-D No:9
      Page(s):
    2549-2551

    Registration consistency (RC) stands out as a widely-used automatic measure from existing image registration evaluation measures. However the original RC neglects the influence brought by the image intensity variation, leading to several problems. This letter proposes a rectified registration consistency, which takes both image intensity variation and geometrical transformation into consideration. Therefore the geometrical transformation is evaluated more by decreasing the influence of intensity variation. Experiments on real image pairs demonstrated the superiority of the proposed measure over the original RC.