The search functionality is under construction.

Author Search Result

[Author] Jinsong ZHANG(3hit)

1-3hit
  • Noise and Channel Distortion Robust ASR System for DARPA SPINE2 Task

    Konstantin MARKOV  Tomoko MATSUI  Rainer GRUHN  Jinsong ZHANG  Satoshi NAKAMURA  

     
    PAPER-Robust Speech Recognition and Enhancement

      Vol:
    E86-D No:3
      Page(s):
    497-504

    This paper presents the ATR speech recognition system designed for the DARPA SPINE2 evaluation task. The system is capable of dealing with speech from highly variable, real-world noisy conditions and communication channels. A number of robust techniques are implemented, such as differential spectrum mel-scale cepstrum features, on-line MLLR adaptation, and word-level hypothesis combination, which led to a significant reduction in the word error rate.

  • Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning

    Richeng DUAN  Tatsuya KAWAHARA  Masatake DANTSUJI  Jinsong ZHANG  

     
    PAPER-Speech and Hearing

      Pubricized:
    2017/05/26
      Vol:
    E100-D No:9
      Page(s):
    2174-2182

    Aiming at detecting pronunciation errors produced by second language learners and providing corrective feedbacks related with articulation, we address effective articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without such data, which is difficult to collect in a large scale, several transfer learning based modeling methods are explored. We first investigate three closely-related secondary tasks which aim at effective learning of DNN articulatory models. We also propose to exploit large speech corpora of native and target language to model inter-language phenomena. This kind of transfer learning can provide a better feature representation of non-native speech. Related task transfer and language transfer learning are further combined on the network level. Compared with the conventional DNN which is used as the baseline, all proposed methods improved the performance. In the native attribute recognition task, the network-level combination method reduced the recognition error rate by more than 10% relative for all articulatory attributes. The method was also applied to pronunciation error detection in Mandarin Chinese pronunciation learning by Japanese native speakers, and achieved the relative improvement up to 17.0% for detection accuracy and up to 19.9% for F-score, which is also better than the lattice-based combination.

  • Phoneme Set Design for Speech Recognition of English by Japanese

    Xiaoyun WANG  Jinsong ZHANG  Masafumi NISHIDA  Seiichi YAMAMOTO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2014/10/01
      Vol:
    E98-D No:1
      Page(s):
    148-156

    This paper describes a novel method to improve the performance of second language speech recognition when the mother tongue of users is known. Considering that second language speech usually includes less fluent pronunciation and more frequent pronunciation mistakes, the authors propose using a reduced phoneme set generated by a phonetic decision tree (PDT)-based top-down sequential splitting method instead of the canonical one of the second language. The authors verify the efficacy of the proposed method using second language speech collected with a translation game type dialogue-based English CALL system. Experiments show that a speech recognizer achieved higher recognition accuracy with the reduced phoneme set than with the canonical phoneme set.