The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] corpus(44hit)

21-40hit(44hit)

  • Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

    Goshu NAGINO  Makoto SHOZAKAI  Tomoki TODA  Hiroshi SARUWATARI  Kiyohiro SHIKANO  

     
    PAPER-Corpus

      Vol:
    E91-D No:3
      Page(s):
    607-614

    This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.

  • An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus

    Jin-Song ZHANG  Satoshi NAKAMURA  

     
    PAPER-Corpus

      Vol:
    E91-D No:3
      Page(s):
    615-630

    An efficient way to develop large scale speech corpora is to collect phonetically rich ones that have high coverage of phonetic contextual units. The sentence set, usually called as the minimum set, should have small text size in order to reduce the collection cost. It can be selected by a greedy search algorithm from a large mother text corpus. With the inclusion of more and more phonetic contextual effects, the number of different phonetic contextual units increased dramatically, making the search not a trivial issue. In order to improve the search efficiency, we previously proposed a so-called least-to-most-ordered greedy search based on the conventional algorithms. This paper evaluated these algorithms in order to show their different characteristics. The experimental results showed that the least-to-most-ordered methods successfully achieved smaller objective sets at significantly less computation time, when compared with the conventional ones. This algorithm has already been applied to the development a number of speech corpora, including a large scale phonetically rich Chinese speech corpus ATRPTH which played an important role in developing our multi-language translation system.

  • An EM-Based Approach for Mining Word Senses from Corpora

    Thatsanee CHAROENPORN  Canasai KRUENGKRAI  Thanaruk THEERAMUNKONG  Virach SORNLERTLAMVANICH  

     
    PAPER-Natural Language Processing

      Vol:
    E90-D No:4
      Page(s):
    775-782

    Manually collecting contexts of a target word and grouping them based on their meanings yields a set of word senses but the task is quite tedious. Towards automated lexicography, this paper proposes a word-sense discrimination method based on two modern techniques; EM algorithm and principal component analysis (PCA). The spherical Gaussian EM algorithm enhanced with PCA for robust initialization is proposed to cluster word senses of a target word automatically. Three variants of the algorithm, namely PCA, sGEM, and PCA-sGEM, are investigated using a gold standard dataset of two polysemous words. The clustering result is evaluated using the measures of purity and entropy as well as a more recent measure called normalized mutual information (NMI). The experimental result indicates that the proposed algorithms gain promising performance with regard to discriminate word senses and the PCA-sGEM outperforms the other two methods to some extent.

  • N-gram Adaptation with Dynamic Interpolation Coefficient Using Information Retrieval Technique

    Joon-Ki CHOI  Yung-Hwan OH  

     
    LETTER-Speech and Hearing

      Vol:
    E89-D No:9
      Page(s):
    2579-2582

    This study presents an N-gram adaptation technique when additional text data for the adaptation do not exist. We use a language modeling approach to the information retrieval (IR) technique to collect the appropriate adaptation corpus from baseline text data. We propose to use a dynamic interpolation coefficient to merge the N-gram, where the interpolation coefficient is estimated from the word hypotheses obtained by segmenting the input speech. Experimental results show that the proposed adapted N-gram always has better performance than the background N-gram.

  • Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

    Takashi SAITO  

     
    PAPER-Speech Analysis

      Vol:
    E89-D No:3
      Page(s):
    1100-1106

    This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to "manipulate" the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.

  • Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

    Tatsuo YOTSUKURA  Shigeo MORISHIMA  Satoshi NAKAMURA  

     
    PAPER

      Vol:
    E88-D No:11
      Page(s):
    2477-2483

    An accurate audio-visual speech corpus is inevitable for talking-heads research. This paper presents our audio-visual speech corpus collection and proposes a head-movement normalization method and a facial motion generation method. The audio-visual corpus contains speech data, movie data on faces, and positions and movements of facial organs. The corpus consists of Japanese phoneme-balanced sentences uttered by a female native speaker. An accurate facial capture is realized by using an optical motion-capture system. We captured high-resolution 3D data by arranging many markers on the speaker's face. In addition, we propose a method of acquiring the facial movements and removing head movements by using affine transformation for computing displacements of pure facial organs. Finally, in order to easily create facial animation from this motion data, we propose a technique assigning the captured data to the facial polygon model. Evaluation results demonstrate the effectiveness of the proposed facial motion generation method and show the relationship between the number of markers and errors.

  • Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity

    Takao DOI  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:6
      Page(s):
    1256-1264

    In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.

  • A VoiceFont Creation Framework for Generating Personalized Voices

    Takashi SAITO  Masaharu SAKAMOTO  

     
    PAPER-Speech Synthesis and Prosody

      Vol:
    E88-D No:3
      Page(s):
    525-534

    This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.

  • CIAIR In-Car Speech Corpus--Influence of Driving Status--

    Nobuo KAWAGUCHI  Shigeki MATSUBARA  Kazuya TAKEDA  Fumitada ITAKURA  

     
    LETTER

      Vol:
    E88-D No:3
      Page(s):
    578-582

    CIAIR, Nagoya University, has been compiling an in-car speech database since 1999. This paper discusses the basic information contained in this database and an analysis on the effects of driving status based on the database. We have developed a system called the Data Collection Vehicle (DCV), which supports synchronous recording of multi-channel audio data from 12 microphones which can be placed throughout the vehicle, multi-channel video recording from three cameras, and the collection of vehicle-related data. In the compilation process, each subject had conversations with three types of dialog system: a human, a "Wizard of Oz" system, and a spoken dialog system. Vehicle information such as speed, engine RPM, accelerator/brake-pedal pressure, and steering-wheel motion were also recorded. In this paper, we report on the effect that driving status has on phenomena specific to spoken language

  • Construction and Evaluation of a Large In-Car Speech Corpus

    Kazuya TAKEDA  Hiroshi FUJIMURA  Katsunobu ITOU  Nobuo KAWAGUCHI  Shigeki MATSUBARA  Fumitada ITAKURA  

     
    PAPER-Speech Corpora and Related Topics

      Vol:
    E88-D No:3
      Page(s):
    553-561

    In this paper, we discuss the construction of a large in-car spoken dialogue corpus and the result of its analysis. We have developed a system specially built into a Data Collection Vehicle (DCV) which supports the synchronous recording of multichannel audio data from 16 microphones that can be placed in flexible positions, multichannel video data from 3 cameras, and vehicle related data. Multimedia data has been collected for three sessions of spoken dialogue with different modes of navigation, during approximately a 60 minute drive by each of 800 subjects. We have characterized the collected dialogues across the three sessions. Some characteristics such as sentence complexity and SNR are found to differ significantly among the sessions. Linear regression analysis results also clarify the relative importance of various corpus characteristics.

  • Gemination of Consonant in Spontaneous Speech: An Analysis of the "Corpus of Spontaneous Japanese"

    Masako FUJIMOTO  Takayuki KAGOMIYA  

     
    PAPER-Speech Corpora and Related Topics

      Vol:
    E88-D No:3
      Page(s):
    562-568

    In Japanese, there is frequent alternation between CV morae and moraic geminate consonants. In this study, we analyzed the phonemic environments of consonant gemination (CG) using the "Corpus of Spontaneous Japanese (CSJ)." The results revealed that the environment in which gemination occurs is, to some extent, parallel to that of vowel devoicing. However, there are two crucial differences. One difference is that the CG tends to occur in a /kVk/ environment, whereas such is not the case for vowel devoicing. The second difference is that when the preceding consonant is /r/, gemination occurs, but not vowel devoicing. These observations suggest that the mechanism leading to CG differs from that which leads to vowel devoicing.

  • Recent Progress in Corpus-Based Spontaneous Speech Recognition

    Sadaoki FURUI  

     
    INVITED PAPER

      Vol:
    E88-D No:3
      Page(s):
    366-375

    This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application of speech recognition depends crucially on raising recognition performance for spontaneous speech. For this purpose, it is necessary to build large spontaneous speech corpora for constructing acoustic and language models. This paper focuses on various achievements of a Japanese 5-year national project "Spontaneous Speech: Corpus and Processing Technology" that has recently been completed. Because of various spontaneous-speech specific phenomena, such as filled pauses, repairs, hesitations, repetitions and disfluencies, recognition of spontaneous speech requires various new techniques. These new techniques include flexible acoustic modeling, sentence boundary detection, pronunciation modeling, acoustic as well as language model adaptation, and automatic summarization. Particularly automatic summarization including indexing, a process which extracts important and reliable parts of the automatic transcription, is expected to play an important role in building various speech archives, speech-based information retrieval systems, and human-computer dialogue systems.

  • Robust Dependency Parsing of Spontaneous Japanese Spoken Language

    Tomohiro OHNO  Shigeki MATSUBARA  Nobuo KAWAGUCHI  Yasuyoshi INAGAKI  

     
    PAPER-Speech Corpora and Related Topics

      Vol:
    E88-D No:3
      Page(s):
    545-552

    Spontaneously spoken Japanese includes a lot of grammatically ill-formed linguistic phenomena such as fillers, hesitations, inversions, and so on, which do not appear in written language. This paper proposes a novel method of robust dependency parsing using a large-scale spoken language corpus, and evaluates the availability and robustness of the method using spontaneously spoken dialogue sentences. By utilizing stochastic information about the appearance of ill-formed phenomena, the method can robustly parse spoken Japanese including fillers, inversions, or dependencies over utterance units. Experimental results reveal that the parsing accuracy reached 87.0%, and we confirmed that it is effective to utilize the location information of a bunsetsu, and the distance information between bunsetsus as stochastic information.

  • Learning Korean Named Entity by Bootstrapping with Web Resources

    Seungwoo LEE  Joohui AN  Byung-Kwan KWAK  Gary Geunbae LEE  

     
    PAPER-Natural Language Processing

      Vol:
    E87-D No:12
      Page(s):
    2872-2882

    An important issue in applying machine learning algorithms to Natural Language Processing areas such as Named Entity Recognition tasks is to overcome the lack of tagged corpora. Several bootstrapping methods such as co-training have been proposed as a solution. In this paper, we present a different approach using the Web resources. A Named Entity (NE) tagged corpus is generated from the Web using about 3,000 names as seeds. The generated corpus may have a lower quality than the manually tagged corpus but its size can be increased sufficiently. Several features are developed and the decision list is learned using the generated corpus. Our method is verified by comparing it to both the decision list learned on the manual corpus and the DL-CoTrain method. We also present a two-level classification by cascading highly precise lexical patterns and the decision list to improve the performance.

  • DODDLE II: A Domain Ontology Development Environment Using a MRD and Text Corpus

    Masaki KUREMATSU  Takamasa IWADE  Naomi NAKAYA  Takahira YAMAGUCHI  

     
    PAPER-Knowledge Engineering and Robotics

      Vol:
    E87-D No:4
      Page(s):
    908-916

    In this paper, we describe how to exploit a machine-readable dictionary (MRD) and domain-specific text corpus in supporting the construction of domain ontologies that specify taxonomic and non-taxonomic relationships among given domain concepts. In building taxonomic relationships (hierarchical structure) of domain concepts, some hierarchical structure can be extracted from a MRD with marked subtrees that may be modified by a domain expert, using matching result analysis and trimmed result analysis. In building non-taxonomic relationships (specification templates) of domain concepts, we construct concept specification templates that come from pairs of concepts extracted from text corpus, using WordSpace and an association rule algorithm. A domain expert modifies taxonomic and non-taxonomic relationships later. Through case studies with "the Contracts for the International Sales of Goods (CISG)" and "XML Common Business Library (xCBL)", we make sure that our system can work to support the process of constructing domain ontologies with a MRD and text corpus.

  • Corpus Based Method of Transforming Nominalized Phrases into Clauses for Text Mining Application

    Akira TERADA  Takenobu TOKUNAGA  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1736-1744

    Nominalization is a linguistic phenomenon in which events usually described in terms of clauses are expressed in the form of noun phrases. Extracting event structures is an important task in text mining applications. To achieve this goal, clauses are parsed and the argument structure of main verbs are extracted from the parsed results. This kind of preprocessing has been commonly done in the past research. In order to extract event structure from nominalized phrases as well, we need to establish a technique to transform nominalized phrases into clauses. In this paper, we propose a method to transform nominalized phrases into clauses by using corpus-based approach. The proposed method first enumerates possible predicate/argument structures by referring to a nominalized phrase (noun phrase) and makes their ranking based on the frequency of each argument in the corpus. The algorithm based on this method was evaluated using a corpus consisting of 24,626 aviation safety reports in English and it achieved a 78% accuracy in transformation. The algorithm was also evaluated by applying a text mining application to extract events and their cause-effect relations from the texts. This application produced an improvement in the text mining application's performance.

  • A Statistical Method for Acquiring Knowledge about the Abbreviation Possibility of Some of Multiple Adnominal Phrases

    Hiroyuki SAKAI  Shigeru MASUYAMA  

     
    PAPER

      Vol:
    E86-D No:9
      Page(s):
    1710-1718

    This paper proposes a statistical method of acquiring knowledge about the abbreviation possibility of some of multiple adnominal phrases. Our method calculates weight values of multiple adnominal phrases by mutual information based on the strength of relation between the adnominal phrases and modified nouns. Among adnominal phrases, those having relatively low weight values are deleted. The evaluation of our method by experiments shows that precision attains about 84.1% and recall attains about 59.2%, respectively.

  • Development of a Lip-Sync Algorithm Based on an Audio-Visual Corpus

    Jinyoung KIM  Joohun LEE  Katsuhiko SHIRAI  

     
    LETTER-Databases

      Vol:
    E86-D No:2
      Page(s):
    334-339

    In this paper, we propose a corpus-based lip-sync algorithm for natural face animation. For this purpose, we constructed a Korean audio-visual (AV) corpus. Based on this AV corpus, we propose a concatenation method of AV units, which is similar to a corpus-based text-to-speech system. For our AV corpus, lip-related parameters were extracted from every video-recorded facial shot which of speaker reads the given texts selected from newspapers. The spoken utterances were labeled with HTK and such prosodic information as duration, pitch and intensity was extracted as lip-sync parameters. Based on the constructed AV corpus, basic synthetic units are set by CVC-syllable units. For the best concatenation performance, based on the phonetic environment distance and the prosodic distance, the best path is estimated by a general Viterbi search algorithm. From the computer simulation results, we found that the information concerned with not only duration but also pitch and intensity is useful to enhance the lip-sync performance. And the reconstructed lip parameters have almost equal values to those of the original parameters.

  • Disambiguating Word Senses in Korean-Japanese Machine Translation by Using Semi-Automatically Constructed Ontology

    Sin-Jae KANG  You-Jin CHUNG  Jong-Hyeok LEE  

     
    PAPER-Natural Language Processing

      Vol:
    E85-D No:10
      Page(s):
    1688-1697

    This paper presents a method for disambiguating word senses in Korean-Japanese machine translation by using a language independent ontology. This ontology stores semantic constraints between concepts and other world knowledge, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In order to acquire a language-independent and reasonably practical ontology in a limited time and with less manpower, we extend the existing Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built electronic dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from a corpus. In practical machine translation systems, our word sense disambiguation method achieved an improvement of average precision by 6.0% for Japanese analysis and by 9.2% for Korean analysis over the method without using an ontology.

  • The Possibility of Magnetic Resonance Imaging-Based Diagnosis of Alzheimer-Type Dementia

    Naoki KODAMA  Tetsuo SHIMADA  Yoshio KOBAYASHI  Kei HIWATASHI  Isao HIYOSHI  Makoto SHIBUKAWA  Yasuhiro KAWASE  Ichiro FUKUMOTO  

     
    LETTER-Medical Engineering

      Vol:
    E85-D No:3
      Page(s):
    592-596

    We studied the possibility of making an objective diagnosis of dementia based on radiological findings by evaluating cerebral and hippocampal atrophy and the corpus callosum shape on MRI images in patients with Alzheimer-type dementia, compared with healthy elderly individuals. There was a statistically significant difference in the hippocampus area index, the ventricle area index, and the area ratio for the second, forth, and fifth parts of corpus callosum. Discriminant analysis using these three parameters demonstrated the sensitivity of 88.5% and the specificity of 85.7%, suggesting a highly positive diagnostic rate. These results indicate that quantitative MRI measurements could be used for differentiating Alzheimer-type dementia from similar diseases.

21-40hit(44hit)