IEICE global.ieice.org Site

Keyword Search Result

[Keyword] corpus(44hit)

21-40hit(44hit)

Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method
Goshu NAGINO Makoto SHOZAKAI Tomoki TODA Hiroshi SARUWATARI Kiyohiro SHIKANO

PAPER-Corpus

Vol:
E91-D No:3
Page(s):
607-614
This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.
An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus
Jin-Song ZHANG Satoshi NAKAMURA

PAPER-Corpus

Vol:
E91-D No:3
Page(s):
615-630
An efficient way to develop large scale speech corpora is to collect phonetically rich ones that have high coverage of phonetic contextual units. The sentence set, usually called as the minimum set, should have small text size in order to reduce the collection cost. It can be selected by a greedy search algorithm from a large mother text corpus. With the inclusion of more and more phonetic contextual effects, the number of different phonetic contextual units increased dramatically, making the search not a trivial issue. In order to improve the search efficiency, we previously proposed a so-called least-to-most-ordered greedy search based on the conventional algorithms. This paper evaluated these algorithms in order to show their different characteristics. The experimental results showed that the least-to-most-ordered methods successfully achieved smaller objective sets at significantly less computation time, when compared with the conventional ones. This algorithm has already been applied to the development a number of speech corpora, including a large scale phonetically rich Chinese speech corpus ATRPTH which played an important role in developing our multi-language translation system.
An EM-Based Approach for Mining Word Senses from Corpora
Thatsanee CHAROENPORN Canasai KRUENGKRAI Thanaruk THEERAMUNKONG Virach SORNLERTLAMVANICH

PAPER-Natural Language Processing

Vol:
E90-D No:4
Page(s):
775-782
Manually collecting contexts of a target word and grouping them based on their meanings yields a set of word senses but the task is quite tedious. Towards automated lexicography, this paper proposes a word-sense discrimination method based on two modern techniques; EM algorithm and principal component analysis (PCA). The spherical Gaussian EM algorithm enhanced with PCA for robust initialization is proposed to cluster word senses of a target word automatically. Three variants of the algorithm, namely PCA, sGEM, and PCA-sGEM, are investigated using a gold standard dataset of two polysemous words. The clustering result is evaluated using the measures of purity and entropy as well as a more recent measure called normalized mutual information (NMI). The experimental result indicates that the proposed algorithms gain promising performance with regard to discriminate word senses and the PCA-sGEM outperforms the other two methods to some extent.
N-gram Adaptation with Dynamic Interpolation Coefficient Using Information Retrieval Technique
Joon-Ki CHOI Yung-Hwan OH

LETTER-Speech and Hearing

Vol:
E89-D No:9
Page(s):
2579-2582
This study presents an N-gram adaptation technique when additional text data for the adaptation do not exist. We use a language modeling approach to the information retrieval (IR) technique to collect the appropriate adaptation corpus from baseline text data. We propose to use a dynamic interpolation coefficient to merge the N-gram, where the interpolation coefficient is estimated from the word hypotheses obtained by segmenting the input speech. Experimental results show that the proposed adapted N-gram always has better performance than the background N-gram.
Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes
Takashi SAITO

PAPER-Speech Analysis

Vol:
E89-D No:3
Page(s):
1100-1106
This paper describes a method of generating F0 contours from natural F0 segmental shapes for speech synthesis. The extracted shapes of the F0 units are basically held invariant by eliminating any averaging operations in the analysis phase and by minimizing modification operations in the synthesis phase. The use of natural F0 shapes has great potential to cover a wide variety of speaking styles with the same framework, including not only read-aloud speech, but also dialogues and emotional speech. A linear-regression statistical model is used to "manipulate" the stored raw F0 shapes to build them up into a sentential F0 contour. Through experimental evaluations, the proposed model is shown to provide stable and robust F0 contour prediction for various speakers. By using this model, linguistically derived information about a sentence can be directly mapped, in a purely data-driven manner, to acoustic F0 values of the sentential intonation contour for a given target speaker.
Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation
Tatsuo YOTSUKURA Shigeo MORISHIMA Satoshi NAKAMURA

PAPER

Vol:
E88-D No:11
Page(s):
2477-2483
An accurate audio-visual speech corpus is inevitable for talking-heads research. This paper presents our audio-visual speech corpus collection and proposes a head-movement normalization method and a facial motion generation method. The audio-visual corpus contains speech data, movie data on faces, and positions and movements of facial organs. The corpus consists of Japanese phoneme-balanced sentences uttered by a female native speaker. An accurate facial capture is realized by using an optical motion-capture system. We captured high-resolution 3D data by arranging many markers on the speaker's face. In addition, we propose a method of acquiring the facial movements and removing head movements by using affine transformation for computing displacements of pure facial organs. Finally, in order to easily create facial animation from this motion data, we propose a technique assigning the captured data to the facial polygon model. Evaluation results demonstrate the effectiveness of the proposed facial motion generation method and show the relationship between the number of markers and errors.
Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity
Takao DOI Eiichiro SUMITA

PAPER-Natural Language Processing

Vol:
E88-D No:6
Page(s):
1256-1264
In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.
A VoiceFont Creation Framework for Generating Personalized Voices
Takashi SAITO Masaharu SAKAMOTO

PAPER-Speech Synthesis and Prosody

Vol:
E88-D No:3
Page(s):
525-534
This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating well-formed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers.
CIAIR In-Car Speech Corpus--Influence of Driving Status--
Nobuo KAWAGUCHI Shigeki MATSUBARA Kazuya TAKEDA Fumitada ITAKURA

LETTER

Vol:
E88-D No:3
Page(s):
578-582
CIAIR, Nagoya University, has been compiling an in-car speech database since 1999. This paper discusses the basic information contained in this database and an analysis on the effects of driving status based on the database. We have developed a system called the Data Collection Vehicle (DCV), which supports synchronous recording of multi-channel audio data from 12 microphones which can be placed throughout the vehicle, multi-channel video recording from three cameras, and the collection of vehicle-related data. In the compilation process, each subject had conversations with three types of dialog system: a human, a "Wizard of Oz" system, and a spoken dialog system. Vehicle information such as speed, engine RPM, accelerator/brake-pedal pressure, and steering-wheel motion were also recorded. In this paper, we report on the effect that driving status has on phenomena specific to spoken language
Construction and Evaluation of a Large In-Car Speech Corpus
Kazuya TAKEDA Hiroshi FUJIMURA Katsunobu ITOU Nobuo KAWAGUCHI Shigeki MATSUBARA Fumitada ITAKURA

PAPER-Speech Corpora and Related Topics

Vol:
E88-D No:3
Page(s):
553-561
In this paper, we discuss the construction of a large in-car spoken dialogue corpus and the result of its analysis. We have developed a system specially built into a Data Collection Vehicle (DCV) which supports the synchronous recording of multichannel audio data from 16 microphones that can be placed in flexible positions, multichannel video data from 3 cameras, and vehicle related data. Multimedia data has been collected for three sessions of spoken dialogue with different modes of navigation, during approximately a 60 minute drive by each of 800 subjects. We have characterized the collected dialogues across the three sessions. Some characteristics such as sentence complexity and SNR are found to differ significantly among the sessions. Linear regression analysis results also clarify the relative importance of various corpus characteristics.
Gemination of Consonant in Spontaneous Speech: An Analysis of the "Corpus of Spontaneous Japanese"
Masako FUJIMOTO Takayuki KAGOMIYA

PAPER-Speech Corpora and Related Topics

Vol:
E88-D No:3
Page(s):
562-568
In Japanese, there is frequent alternation between CV morae and moraic geminate consonants. In this study, we analyzed the phonemic environments of consonant gemination (CG) using the "Corpus of Spontaneous Japanese (CSJ)." The results revealed that the environment in which gemination occurs is, to some extent, parallel to that of vowel devoicing. However, there are two crucial differences. One difference is that the CG tends to occur in a /kVk/ environment, whereas such is not the case for vowel devoicing. The second difference is that when the preceding consonant is /r/, gemination occurs, but not vowel devoicing. These observations suggest that the mechanism leading to CG differs from that which leads to vowel devoicing.
Recent Progress in Corpus-Based Spontaneous Speech Recognition
Sadaoki FURUI

INVITED PAPER

Vol:
E88-D No:3
Page(s):
366-375
This paper overviews recent progress in the development of corpus-based spontaneous speech recognition technology. Although speech is in almost any situation spontaneous, recognition of spontaneous speech is an area which has only recently emerged in the field of automatic speech recognition. Broadening the application of speech recognition depends crucially on raising recognition performance for spontaneous speech. For this purpose, it is necessary to build large spontaneous speech corpora for constructing acoustic and language models. This paper focuses on various achievements of a Japanese 5-year national project "Spontaneous Speech: Corpus and Processing Technology" that has recently been completed. Because of various spontaneous-speech specific phenomena, such as filled pauses, repairs, hesitations, repetitions and disfluencies, recognition of spontaneous speech requires various new techniques. These new techniques include flexible acoustic modeling, sentence boundary detection, pronunciation modeling, acoustic as well as language model adaptation, and automatic summarization. Particularly automatic summarization including indexing, a process which extracts important and reliable parts of the automatic transcription, is expected to play an important role in building various speech archives, speech-based information retrieval systems, and human-computer dialogue systems.
Robust Dependency Parsing of Spontaneous Japanese Spoken Language
Tomohiro OHNO Shigeki MATSUBARA Nobuo KAWAGUCHI Yasuyoshi INAGAKI

PAPER-Speech Corpora and Related Topics

Vol:
E88-D No:3
Page(s):
545-552
Spontaneously spoken Japanese includes a lot of grammatically ill-formed linguistic phenomena such as fillers, hesitations, inversions, and so on, which do not appear in written language. This paper proposes a novel method of robust dependency parsing using a large-scale spoken language corpus, and evaluates the availability and robustness of the method using spontaneously spoken dialogue sentences. By utilizing stochastic information about the appearance of ill-formed phenomena, the method can robustly parse spoken Japanese including fillers, inversions, or dependencies over utterance units. Experimental results reveal that the parsing accuracy reached 87.0%, and we confirmed that it is effective to utilize the location information of a bunsetsu, and the distance information between bunsetsus as stochastic information.
Learning Korean Named Entity by Bootstrapping with Web Resources
Seungwoo LEE Joohui AN Byung-Kwan KWAK Gary Geunbae LEE

PAPER-Natural Language Processing

Vol:
E87-D No:12
Page(s):
2872-2882
An important issue in applying machine learning algorithms to Natural Language Processing areas such as Named Entity Recognition tasks is to overcome the lack of tagged corpora. Several bootstrapping methods such as co-training have been proposed as a solution. In this paper, we present a different approach using the Web resources. A Named Entity (NE) tagged corpus is generated from the Web using about 3,000 names as seeds. The generated corpus may have a lower quality than the manually tagged corpus but its size can be increased sufficiently. Several features are developed and the decision list is learned using the generated corpus. Our method is verified by comparing it to both the decision list learned on the manual corpus and the DL-CoTrain method. We also present a two-level classification by cascading highly precise lexical patterns and the decision list to improve the performance.
DODDLE II: A Domain Ontology Development Environment Using a MRD and Text Corpus
Masaki KUREMATSU Takamasa IWADE Naomi NAKAYA Takahira YAMAGUCHI

PAPER-Knowledge Engineering and Robotics

Vol:
E87-D No:4
Page(s):
908-916
In this paper, we describe how to exploit a machine-readable dictionary (MRD) and domain-specific text corpus in supporting the construction of domain ontologies that specify taxonomic and non-taxonomic relationships among given domain concepts. In building taxonomic relationships (hierarchical structure) of domain concepts, some hierarchical structure can be extracted from a MRD with marked subtrees that may be modified by a domain expert, using matching result analysis and trimmed result analysis. In building non-taxonomic relationships (specification templates) of domain concepts, we construct concept specification templates that come from pairs of concepts extracted from text corpus, using WordSpace and an association rule algorithm. A domain expert modifies taxonomic and non-taxonomic relationships later. Through case studies with "the Contracts for the International Sales of Goods (CISG)" and "XML Common Business Library (xCBL)", we make sure that our system can work to support the process of constructing domain ontologies with a MRD and text corpus.
Corpus Based Method of Transforming Nominalized Phrases into Clauses for Text Mining Application
Akira TERADA Takenobu TOKUNAGA

PAPER

Vol:
E86-D No:9
Page(s):
1736-1744
Nominalization is a linguistic phenomenon in which events usually described in terms of clauses are expressed in the form of noun phrases. Extracting event structures is an important task in text mining applications. To achieve this goal, clauses are parsed and the argument structure of main verbs are extracted from the parsed results. This kind of preprocessing has been commonly done in the past research. In order to extract event structure from nominalized phrases as well, we need to establish a technique to transform nominalized phrases into clauses. In this paper, we propose a method to transform nominalized phrases into clauses by using corpus-based approach. The proposed method first enumerates possible predicate/argument structures by referring to a nominalized phrase (noun phrase) and makes their ranking based on the frequency of each argument in the corpus. The algorithm based on this method was evaluated using a corpus consisting of 24,626 aviation safety reports in English and it achieved a 78% accuracy in transformation. The algorithm was also evaluated by applying a text mining application to extract events and their cause-effect relations from the texts. This application produced an improvement in the text mining application's performance.
A Statistical Method for Acquiring Knowledge about the Abbreviation Possibility of Some of Multiple Adnominal Phrases
Hiroyuki SAKAI Shigeru MASUYAMA

PAPER

Vol:
E86-D No:9
Page(s):
1710-1718
This paper proposes a statistical method of acquiring knowledge about the abbreviation possibility of some of multiple adnominal phrases. Our method calculates weight values of multiple adnominal phrases by mutual information based on the strength of relation between the adnominal phrases and modified nouns. Among adnominal phrases, those having relatively low weight values are deleted. The evaluation of our method by experiments shows that precision attains about 84.1% and recall attains about 59.2%, respectively.
Development of a Lip-Sync Algorithm Based on an Audio-Visual Corpus
Jinyoung KIM Joohun LEE Katsuhiko SHIRAI

LETTER-Databases

Vol:
E86-D No:2
Page(s):
334-339
In this paper, we propose a corpus-based lip-sync algorithm for natural face animation. For this purpose, we constructed a Korean audio-visual (AV) corpus. Based on this AV corpus, we propose a concatenation method of AV units, which is similar to a corpus-based text-to-speech system. For our AV corpus, lip-related parameters were extracted from every video-recorded facial shot which of speaker reads the given texts selected from newspapers. The spoken utterances were labeled with HTK and such prosodic information as duration, pitch and intensity was extracted as lip-sync parameters. Based on the constructed AV corpus, basic synthetic units are set by CVC-syllable units. For the best concatenation performance, based on the phonetic environment distance and the prosodic distance, the best path is estimated by a general Viterbi search algorithm. From the computer simulation results, we found that the information concerned with not only duration but also pitch and intensity is useful to enhance the lip-sync performance. And the reconstructed lip parameters have almost equal values to those of the original parameters.
Disambiguating Word Senses in Korean-Japanese Machine Translation by Using Semi-Automatically Constructed Ontology
Sin-Jae KANG You-Jin CHUNG Jong-Hyeok LEE

PAPER-Natural Language Processing

Vol:
E85-D No:10
Page(s):
1688-1697
This paper presents a method for disambiguating word senses in Korean-Japanese machine translation by using a language independent ontology. This ontology stores semantic constraints between concepts and other world knowledge, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In order to acquire a language-independent and reasonably practical ontology in a limited time and with less manpower, we extend the existing Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built electronic dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from a corpus. In practical machine translation systems, our word sense disambiguation method achieved an improvement of average precision by 6.0% for Japanese analysis and by 9.2% for Korean analysis over the method without using an ontology.
The Possibility of Magnetic Resonance Imaging-Based Diagnosis of Alzheimer-Type Dementia
Naoki KODAMA Tetsuo SHIMADA Yoshio KOBAYASHI Kei HIWATASHI Isao HIYOSHI Makoto SHIBUKAWA Yasuhiro KAWASE Ichiro FUKUMOTO

LETTER-Medical Engineering

Vol:
E85-D No:3
Page(s):
592-596
We studied the possibility of making an objective diagnosis of dementia based on radiological findings by evaluating cerebral and hippocampal atrophy and the corpus callosum shape on MRI images in patients with Alzheimer-type dementia, compared with healthy elderly individuals. There was a statistically significant difference in the hippocampus area index, the ventricle area index, and the area ratio for the second, forth, and fifth parts of corpus callosum. Discriminant analysis using these three parameters demonstrated the sensitivity of 88.5% and the specificity of 85.7%, suggesting a highly positive diagnostic rate. These results indicate that quantitative MRI measurements could be used for differentiating Alzheimer-type dementia from similar diseases.

21-40hit(44hit)

Keyword Search Result

[Keyword] corpus(44hit)

Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus

An EM-Based Approach for Mining Word Senses from Corpora

N-gram Adaptation with Dynamic Interpolation Coefficient Using Information Retrieval Technique

Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes

Construction of Audio-Visual Speech Corpus Using Motion-Capture System and Corpus Based Facial Animation

Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity

A VoiceFont Creation Framework for Generating Personalized Voices

CIAIR In-Car Speech Corpus--Influence of Driving Status--

Construction and Evaluation of a Large In-Car Speech Corpus

Gemination of Consonant in Spontaneous Speech: An Analysis of the "Corpus of Spontaneous Japanese"

Recent Progress in Corpus-Based Spontaneous Speech Recognition

Robust Dependency Parsing of Spontaneous Japanese Spoken Language

Learning Korean Named Entity by Bootstrapping with Web Resources

DODDLE II: A Domain Ontology Development Environment Using a MRD and Text Corpus

Corpus Based Method of Transforming Nominalized Phrases into Clauses for Text Mining Application

A Statistical Method for Acquiring Knowledge about the Abbreviation Possibility of Some of Multiple Adnominal Phrases

Development of a Lip-Sync Algorithm Based on an Audio-Visual Corpus

Disambiguating Word Senses in Korean-Japanese Machine Translation by Using Semi-Automatically Constructed Ontology

The Possibility of Magnetic Resonance Imaging-Based Diagnosis of Alzheimer-Type Dementia

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles