The search functionality is under construction.

Keyword Search Result

[Keyword] spontaneous speech(7hit)

1-7hit
  • Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation Open Access

    Ryo MASUMURA  Taichi ASAMI  Takanobu OBA  Hirokazu MASATAKI  Sumitaka SAKAUCHI  Akinori ITO  

     
    PAPER-Language modeling

      Pubricized:
    2016/07/19
      Vol:
    E99-D No:10
      Page(s):
    2452-2461

    This paper aims to investigate the performance improvements made possible by combining various major language model (LM) technologies together and to reveal the interactions between LM technologies in spontaneous automatic speech recognition tasks. While it is clear that recent practical LMs have several problems, isolated use of major LM technologies does not appear to offer sufficient performance. In consideration of this fact, combining various LM technologies has been also examined. However, previous works only focused on modeling technologies with limited text resources, and did not consider other important technologies in practical language modeling, i.e., use of external text resources and unsupervised adaptation. This paper, therefore, employs not only manual transcriptions of target speech recognition tasks but also external text resources. In addition, unsupervised LM adaptation based on multi-pass decoding is also added to the combination. We divide LM technologies into three categories and employ key ones including recurrent neural network LMs or discriminative LMs. Our experiments show the effectiveness of combining various LM technologies in not only in-domain tasks, the subject of our previous work, but also out-of-domain tasks. Furthermore, we also reveal the relationships between the technologies in both tasks.

  • Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

    Hansjorg HOFMANN  Sakriani SAKTI  Chiori HORI  Hideki KASHIOKA  Satoshi NAKAMURA  Wolfgang MINKER  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:8
      Page(s):
    2084-2093

    The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

  • Recent Progress in Corpus-Based Spontaneous Speech Recognition

    Sadaoki FURUI  

     
    INVITED PAPER

      Vol:
    E88-D No:3
      Page(s):
    366-375

    This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application of speech recognition depends crucially on raising recognition performance for spontaneous speech. For this purpose, it is necessary to build large spontaneous speech corpora for constructing acoustic and language models. This paper focuses on various achievements of a Japanese 5-year national project "Spontaneous Speech: Corpus and Processing Technology" that has recently been completed. Because of various spontaneous-speech specific phenomena, such as filled pauses, repairs, hesitations, repetitions and disfluencies, recognition of spontaneous speech requires various new techniques. These new techniques include flexible acoustic modeling, sentence boundary detection, pronunciation modeling, acoustic as well as language model adaptation, and automatic summarization. Particularly automatic summarization including indexing, a process which extracts important and reliable parts of the automatic transcription, is expected to play an important role in building various speech archives, speech-based information retrieval systems, and human-computer dialogue systems.

  • An Unsupervised Speaker Adaptation Method for Lecture-Style Spontaneous Speech Recognition Using Multiple Recognition Systems

    Seiichi NAKAGAWA  Tomohiro WATANABE  Hiromitsu NISHIZAKI  Takehito UTSURO  

     
    PAPER-Spoken Language Systems

      Vol:
    E88-D No:3
      Page(s):
    463-471

    This paper describes an accurate unsupervised speaker adaptation method for lecture style spontaneous speech recognition using multiple LVCSR systems. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models remarkably depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by confidence measure is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSR systems, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and the method improved the recognition rate by about 2.1% in comparison with a traditional method.

  • Dynamic Bayesian Network-Based Acoustic Models Incorporating Speaking Rate Effects

    Takahiro SHINOZAKI  Sadaoki FURUI  

     
    PAPER-Speech and Hearing

      Vol:
    E87-D No:10
      Page(s):
    2339-2347

    One of the most important issues in spontaneous speech recognition is how to cope with the degradation of recognition accuracy due to speaking rate fluctuation within an utterance. This paper proposes an acoustic model for adjusting mixture weights and transition probabilities of the HMM for each frame according to the local speaking rate. The proposed model is implemented along with variants and conventional models using the Bayesian network framework. The proposed model has a hidden variable representing variation of the "mode" of the speaking rate, and its value controls the parameters of the underlying HMM. Model training and maximum probability assignment of the variables are conducted using the EM/GEM and inference algorithms for the Bayesian networks. Utterances from meetings and lectures are used for evaluation where the Bayesian network-based acoustic models are used to rescore the likelihood of the N-best lists. In the experiments, the proposed model indicated consistently higher performance than conventional HMMs and regression HMMs using the same speaking rate information.

  • Design and Construction of an Advisory Dialogue Database

    Tadahiko KUMAMOTO  Akira ITO  Tsuyoshi EBINA  

     
    PAPER-Databases

      Vol:
    E78-D No:4
      Page(s):
    420-427

    We are aming to develop a computer-based consultant system which helps novice computer users to achieve their task goals on computers through natural language dialogues. Our target is spoken Japanese. To develop effective methods for processing spoken Japanese, it is essential to analyze real dialogues and to find the characteristics of spoken Japanese. In this paper, we discuss the design problems associated with constructing a spoken dialogue database from the viewpoint of advisory dialogue collection, describe XMH (X-window-based electronic mail handling program) usage experiments made to collect advisory dialogues between novice XMH users and an expert consultant, and show the dialogue database we constructed from these dialogues. The main features of our database are as follows: (1) Our target dialogues were advisory ones. (2) The advisory dialogues were all related to the use of XMH that has a visual interface operated by a keyboard and a mouse. (3) The primary objective of the users was not to engage in dialogues but to achieve specific task goals using XMH. (4) Not only what the users said but also XMH operations performed by the users are included as dialogue elements. This kind of dialogue database is a very effective source for developing new methods for processing spoken language in multimodal consultant systems, and we have therefore made it available to the public. Based on our analysis of the database, we have already developed several effective methods such as a method for recognizing user's communicative intention from a transcript of spoken Japanese, and a method for controlling dialogues between a novice XMH user and the computer-based consultant system which we are developing. Also, we have proposed several response generation rules as the response strategy for the consultant system. We have developed an experimental consultant system by implementing the above methods and strategy.

  • System Design, Data Collection and Evaluation of a Speech Dialogue System

    Katunobu ITOU  Satoru HAYAMIZU  Kazuyo TANAKA  Hozumi TANAKA  

     
    PAPER

      Vol:
    E76-D No:1
      Page(s):
    121-127

    This paper describes design issues of a speech dialogue system, the evaluation of the system, and the data collection of spontaneous speech in a transportation guidance domain. As it is difficult to collect spontaneous speech and to use a real system for the collection and evaluation, the phenomena related with dialogues have not been quantitatively clarified yet. The authors constructed a speech dialogue system which operates in almost real time, with acceptable recognition accuracy and flexible dialogue control. The system was used for spontaneous speech collection in a transportation guidance domain. The system performance evaluated in the domain is the understanding rate of 84.2% for the utterances within the predefined grammar and the lexicon. Also some statistics of the spontaneous speech collected are given.