The search functionality is under construction.

Author Search Result

[Author] Shoichi MATSUNAGA(5hit)

1-5hit
  • Topic Extraction based on Continuous Speech Recognition in Broadcast News Speech

    Katsutoshi OHTSUKI  Tatsuo MATSUOKA  Shoichi MATSUNAGA  Sadaoki FURUI  

     
    PAPER-Speech and Hearing

      Vol:
    E85-D No:7
      Page(s):
    1138-1144

    In this paper, we propose topic extraction models based on statistical relevance scores between topic words and words in articles, and report results obtained in topic extraction experiments using continuous speech recognition for Japanese broadcast news utterances. We attempt to represent a topic of news speech using a combination of multiple topic words, which are important words in the news article or words relevant to the news. We assume a topic of news is represented by a combination of words. We statistically model mapping from words in an article to topic words. Using the mapping, the topic extraction model can extract topic words even if they do not appear in the article. We train a topic extraction model capable of computing the degree of relevance between a topic word and a word in an article by using newspaper text covering a five-year period. The degree of relevance between those words is calculated based on measures such as mutual information or the χ2-method. In experiments extracting five topic words using a χ2-based model, we achieve 72% precision and 12% recall for speech recognition results. Speech recognition results generally include a number of recognition errors, which degrades topic extraction performance. To avoid this, we employ N-best candidates and likelihood given by acoustic and language models. In experiments, we find that extracting five topic words using N-best candidate and likelihood values achieves significantly improved precision.

  • Improved Phoneme-History-Dependent Search Method for Large-Vocabulary Continuous-Speech Recognition

    Takaaki HORI  Yoshiaki NODA  Shoichi MATSUNAGA  

     
    PAPER-Speech and Hearing

      Vol:
    E86-D No:6
      Page(s):
    1059-1067

    This paper presents an improved phoneme-history-dependent (PHD) search algorithm. This method is an optimum algorithm under the assumption that the starting time of a recognized word depends on only a few preceding phonemes (phoneme history). The computational cost and the number of recognition errors can be reduced if the phoneme-history-dependent search uses re-selection of the preceding word and an appropriate length of phoneme histories. These improvements increase the speed of decoding and help to ensure that the resulting word graph has the correct word sequence. In a 65k-word domain-independent Japanese read-speech dictation task and 1000-word spontaneous-speech airline-ticket-reservation task, the improved PHD search was 1.2-1.8 times faster than a traditional word-dependent search under the condition of equal word accuracy. The improved search reduced the number of errors by a maximum of 21% under the condition of equal processing time. The results also show that our search can generate more compact and accurate word graphs than those of the original PHD search method. In addition, we investigated the optimum length of the phoneme history in the search.

  • Unsupervised Speaker Adaptation Using All-Phoneme Ergodic Hidden Markov Network

    Yasunage MIYAZAWA  Jun-ichi TAKAMI  Shigeki SAGAYAMA  Shoichi MATSUNAGA  

     
    PAPER-Speech Processing and Acoustics

      Vol:
    E78-D No:8
      Page(s):
    1044-1050

    This paper proposes an unsupervised speaker adaptation method using an all-phoneme ergodic Hidden Markov Network" that combines allophonic (context-dependent phone) acoustic models with stochastic language constraints. Hidden Markov Network (HMnet) for allophone modeling and allophonic bigram probabilities derived from a large text database are combined to yield a single large ergodic HMM which represents arbitrary speech signals in a particular language so that the model parameters can be re-estimated using text-unknown speech samples with the Baum-Welch algorithm. When combined with the Vector Field Smoothing (VFS) technique, unsupervised speaker adaptation can be effectively performed. This method experimentally gave better performances compared with our previous unsupervised adaptation method which used conventional phonetic HMMs and phoneme bigram probabilities especially when the amount of training data was small.

  • Speaker-Consistent Parsing for Speaker-Independent Continuous Speech Recognition

    Kouichi YAMAGUCHI  Harald SINGER  Shoichi MATSUNAGA  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    719-724

    This paper describes a novel speaker-independent speech recognition method, called speaker-consistent parsing", which is based on an intra-speaker correlation called the speaker-consistency principle. We focus on the fact that a sentence or a string of words is uttered by an individual speaker even in a speaker-independent task. Thus, the proposed method searches through speaker variations in addition to the contents of utterances. As a result of the recognition process, an appropriate standard speaker is selected for speaker adaptation. This new method is experimentally compared with a conventional speaker-independent speech recognition method. Since the speaker-consistency principle best demonstrates its effect with a large number of training and test speakers, a small-scale experiment may not fully exploit this principle. Nevertheless, even the results of our small-scale experiment show that the new method significantly outperforms the conventional method. In addition, this framework's speaker selection mechanism can drastically reduce the likelihood map computation.

  • Speech Recognition Using Function-Word N-Grams and Content-Word N-Grams

    Ryosuke ISOTANI  Shoichi MATSUNAGA  Shigeki SAGAYAMA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    692-697

    This paper proposes a new stochastic language model for speech recognition based on function-word N-grams and content-word N-grams. The conventional word N-gram models are effective for speech recognition, but they represent only local constraints within a few successive words and lack the ability to capture global syntactic or semantic relationships between words. To represent more global constraints, the proposed language model gives the N-gram probabilities of word sequences, with attention given only to function words or to content words. The sequences of function words and of content words are expected to represent syntactic and semantic constraints, respectively. Probabilities of function-word bigrams and content-word bigrams were estimated from a 10,000-sentence text database, and analysis using information theoretic measure showed that expected constraints were extracted appropriately. As an application of this model to speech recognition, a post-processor was constructed to select the optimum sentence candidate from a phrase lattice obtained by a phrase recognition system. The phrase candidate sequence with the highest total acoustic and linguistic score was sought by dynamic programming. The results of experiments carried out on the utterances of 12 speakers showed that the proposed method is more accurate than a CFG-based method, thus demonstrating its effectiveness in improving speech recognition performance.