The search functionality is under construction.

Author Search Result

[Author] Qingqing ZHANG(3hit)

1-3hit
  • On Hermitian LCD Generalized Gabidulin Codes

    Xubo ZHAO  Xiaoping LI  Runzhi YANG  Qingqing ZHANG  Jinpeng LIU  

     
    LETTER-Coding Theory

      Pubricized:
    2021/09/13
      Vol:
    E105-A No:3
      Page(s):
    607-610

    In this paper, we study Hermitian linear complementary dual (abbreviated Hermitian LCD) rank metric codes. A class of Hermitian LCD generalized Gabidulin codes are constructed by qm-self-dual bases of Fq2m over Fq2. Moreover, the exact number of qm-self-dual bases of Fq2m over Fq2 is derived. As a consequence, an upper bound and a lower bound of the number of the constructed Hermitian LCD generalized Gabidulin codes are determined.

  • Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval

    Qingqing ZHANG  Jielin PAN  Yang LIN  Jian SHAO  Yonghong YAN  

     
    PAPER-Acoustic Modeling

      Vol:
    E91-D No:3
      Page(s):
    514-521

    In recent decades, there has been a great deal of research into the problem of bilingual speech recognition - to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users' native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.

  • A One-Pass Real-Time Decoder Using Memory-Efficient State Network

    Jian SHAO  Ta LI  Qingqing ZHANG  Qingwei ZHAO  Yonghong YAN  

     
    PAPER-ASR System Architecture

      Vol:
    E91-D No:3
      Page(s):
    529-537

    This paper presents our developed decoder which adopts the idea of statically optimizing part of the knowledge sources while handling the others dynamically. The lexicon, phonetic contexts and acoustic model are statically integrated to form a memory-efficient state network, while the language model (LM) is dynamically incorporated on the fly by means of extended tokens. The novelties of our approach for constructing the state network are (1) introducing two layers of dummy nodes to cluster the cross-word (CW) context dependent fan-in and fan-out triphones, (2) introducing a so-called "WI layer" to store the word identities and putting the nodes of this layer in the non-shared mid-part of the network, (3) optimizing the network at state level by a sufficient forward and backward node-merge process. The state network is organized as a multi-layer structure for distinct token propagation at each layer. By exploiting the characteristics of the state network, several techniques including LM look-ahead, LM cache and beam pruning are specially designed for search efficiency. Especially in beam pruning, a layer-dependent pruning method is proposed to further reduce the search space. The layer-dependent pruning takes account of the neck-like characteristics of WI layer and the reduced variety of word endings, which enables tighter beam without introducing much search errors. In addition, other techniques including LM compression, lattice-based bookkeeping and lattice garbage collection are also employed to reduce the memory requirements. Experiments are carried out on a Mandarin spontaneous speech recognition task where the decoder involves a trigram LM and CW triphone models. A comparison with HDecode of HTK toolkits shows that, within 1% performance deviation, our decoder can run 5 times faster with half of the memory footprint.