The search functionality is under construction.

Author Search Result

[Author] Hideki KASHIOKA(6hit)

1-6hit
  • Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

    Hansjorg HOFMANN  Sakriani SAKTI  Chiori HORI  Hideki KASHIOKA  Satoshi NAKAMURA  Wolfgang MINKER  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:8
      Page(s):
    2084-2093

    The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

  • A Speech Translation System Applied to a Real-World Task/Domain and Its Evaluation Using Real-World Speech Data

    Atsushi NAKAMURA  Masaki NAITO  Hajime TSUKADA  Rainer GRUHN  Eiichiro SUMITA  Hideki KASHIOKA  Hideharu NAKAJIMA  Tohru SHIMIZU  Yoshinori SAGISAKA  

     
    PAPER-Speech and Hearing

      Vol:
    E84-D No:1
      Page(s):
    142-154

    This paper describes an application of a speech translation system to another task/domain in the real-world by using developmental data collected from real-world interactions. The total cost for this task-alteration was calculated to be 9 Person-Month. The newly applied system was also evaluated by using speech data collected from real-world interactions. For real-world speech having a machine-friendly speaking style, the newly applied system could recognize typical sentences with a word accuracy of 90% or better. We also found that, concerning the overall speech translation performance, the system could translate about 80% of the input Japanese speech into acceptable English sentences.

  • Consolidation-Based Speech Translation and Evaluation Approach

    Chiori HORI  Bing ZHAO  Stephan VOGEL  Alex WAIBEL  Hideki KASHIOKA  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Vol:
    E92-D No:3
      Page(s):
    477-488

    The performance of speech translation systems combining automatic speech recognition (ASR) and machine translation (MT) systems is degraded by redundant and irrelevant information caused by speaker disfluency and recognition errors. This paper proposes a new approach to translating speech recognition results through speech consolidation, which removes ASR errors and disfluencies and extracts meaningful phrases. A consolidation approach is spun off from speech summarization by word extraction from ASR 1-best. We extended the consolidation approach for confusion network (CN) and tested the performance using TED speech and confirmed the consolidation results preserved more meaningful phrases in comparison with the original ASR results. We applied the consolidation technique to speech translation. To test the performance of consolidation-based speech translation, Chinese broadcast news (BN) speech in RT04 were recognized, consolidated and then translated. The speech translation results via consolidation cannot be directly compared with gold standards in which all words in speech are translated because consolidation-based translations are partial translations. We would like to propose a new evaluation framework for partial translation by comparing them with the most similar set of words extracted from a word network created by merging gradual summarizations of the gold standard translation. The performance of consolidation-based MT results was evaluated using BLEU. We also propose Information Preservation Accuracy (IPAccy) and Meaning Preservation Accuracy (MPAccy) to evaluate consolidation and consolidation-based MT. We confirmed that consolidation contributed to the performance of speech translation.

  • FOREWORD Open Access

    Hideki KASHIOKA  

     
    FOREWORD

      Vol:
    E99-D No:6
      Page(s):
    1436-1436
  • Non-Audible Murmur (NAM) Recognition

    Yoshitaka NAKAJIMA  Hideki KASHIOKA  Nick CAMPBELL  Kiyohiro SHIKANO  

     
    PAPER

      Vol:
    E89-D No:1
      Page(s):
    1-8

    We propose a new practical input interface for the recognition of Non-Audible Murmur (NAM), which is defined as articulated respiratory sound without vocal-fold vibration transmitted through the soft tissues of the head. We developed a microphone attachment, which adheres to the skin, by applying the principle of a medical stethoscope, found the ideal position for sampling flesh-conducted NAM sound vibration and retrained an acoustic model with NAM samples. Then using the Julius Japanese Dictation Toolkit, we tested the feasibility of using this method in place of an external microphone for analyzing air-conducted voice sound.

  • An Unsupervised Model of Redundancy for Answer Validation

    Youzheng WU  Hideki KASHIOKA  Satoshi NAKAMURA  

     
    PAPER-Natural Language Processing

      Vol:
    E93-D No:3
      Page(s):
    624-634

    Given a question and a set of its candidate answers, the task of answer validation (AV) aims to return a Boolean value indicating whether a given candidate answer is the correct answer to the question. Unlike previous works, this paper presents an unsupervised model, called the U-model, for AV. This approach regards AV as a classification task and investigates how effectively using redundancy of the Web into the proposed architecture. Experimental results with TREC factoid test sets and Chinese test sets indicate that the proposed U-model with redundancy information is very effective for AV. For example, the top@1/mrr@5 scores on the TREC05, and 06 tracks are 40.1/51.5% and 35.8/47.3%, respectively. Furthermore, a cross-model comparison experiment demonstrates that the U-model is the best among the redundancy-based models considered. Even compared with a syntax-based approach, a supervised machine learning approach and a pattern-based approach, the U-model performs much better.