Appropriate language modeling is one of the major issues for automatic transcription of spontaneous speech. We propose an adaptation method for statistical language models based on both topic and speaker characteristics. This approach is applied for automatic transcription of meetings and panel discussions, in which multiple participants speak on a given topic in their own speaking style. A baseline language model is a mixture of two models, which are trained with different corpora covering various topics and speakers, respectively. Then, probabilistic latent semantic analysis (PLSA) is performed on the same respective corpora and the initial ASR result to provide two sets of unigram probabilities conditioned on input speech, with regard to topics and speaker characteristics, respectively. Finally, the baseline model is adapted by scaling N-gram probabilities with these unigram probabilities. For speaker adaptation purpose, we make use of a portion of the Corpus of Spontaneous Japanese (CSJ) in which a large number of speakers gave talks for given topics. Experimental evaluation with real discussions showed that both topic and speaker adaptation reduced test-set perplexity, and in total, an average reduction rate of 8.5% was obtained. Furthermore, improvement on word accuracy was also achieved by the proposed adaptation method.
Minh LE NGUYEN Masaru FUKUSHI Susumu HORIGUCHI
This paper describes a new probabilistic sentence reduction method using maximum entropy model. In contrast to previous methods, the proposed method has the ability to produce multiple best results for a given sentence, which is useful in text summarization applications. Experimental results show that the proposed method improves on earlier methods in both accuracy and computation time.
This paper discusses real-time language recognition by 1-dimensional one-way cellular automata (OCAs) and two-way cellular automata (CAs), focusing on limitations of the parallel computation power. To clarify the limitations, we investigate real-time recognition of cyclic strings of the form uk with u {0,1}+ and k 2. We show a version of pumping lemma for recognizing cyclic strings by OCAs, which can be used for proving that several languages are not recognizable by OCAs in real time. The paper also discusses the real-time language recognition of CAs by prefix and postfix computation, in which every prefix or postfix of an input string is also accepted, if the prefix or postfix is in the language. It is shown that there are languages L
So-Young PARK Yong-Jae KWAK Joon-Ho LIM Hae-Chang RIM
In this paper, we propose a probabilistic feature-based parsing model for head-final languages, which can lead to an improvement of syntactic disambiguation while reducing the parsing cost related to lexical information. For effective syntactic disambiguation, the proposed parsing model utilizes several useful features such as a syntactic label feature, a content feature, a functional feature, and a size feature. Moreover, it is designed to be suitable for representing word order variation of non-head words in head-final languages. Experimental results show that the proposed parsing model performs better than previous lexicalized parsing models, although it has much less dependence on lexical information.
This article introduces a novel approach to model phonology and morphosyntax in morpheme unit-based speech recognizers. The proposed methods are evaluated on a Hungarian newspaper dictation task that requires modeling over 1 million different word forms. The architecture of the recognition system is based on the weighted finite-state transducer (WFST) paradigm. The vocabulary units used in the system are morpheme-based in order to provide sufficient coverage of the large number of word-forms resulting from affixation and compounding. Besides the basic pronunciation model and the morpheme N-gram language model we evaluate a novel phonology model and the novel stochastic morphosyntactic language model (SMLM). Thanks to the flexible transducer-based architecture of the system, these new components are integrated seamlessly with the basic modules with no need to modify the decoder itself. We compare the phoneme, morpheme, and word error-rates as well as the sizes of the recognition networks in two configurations. In one configuration we use only the N-gram model while in the other we use the combined model. The proposed stochastic morphosyntactic language model decreases the morpheme error rate by between 1.7 and 7.2% relatively when compared to the baseline trigram system. The proposed phonology model reduced the error rate by 8.32%. The morpheme error-rate of the best configuration is 18% and the best word error-rate is 22.3%.
A large body of research in the measurement of software complexity at code level has been conducted, but little effort has been made to measure the architectural-level complexity of a software system. In this paper, we propose some architectural-level metrics which are appropriate for evaluating the architectural attributes of a software system. The main feature of our approach is to assess the architectural-level complexity of a software system by analyzing its formal architectural specification, and therefore the process of metric computation can be automated completely.
Any utterances of dialogue, spoken language or sign language, have functions that enable recipients to achieve real-time and easy understanding and to control conversation smoothly in spite of its volatile characteristics. In this paper, we present evidence of these functions obtained experimentally. Prosody plays a very important role not only in spoken language (aural language) but also in sign language (visual language) and finger braille (tactile language). Skilled users of a language may detect word boundaries in utterances and estimate sentence structure immediately using prosody. The gestures and glances of a recipient may influence the utterances of the sender, leading to amendments of the contents of utterances and smooth exchanges in turn. Individuality and emotion in utterances are also very important aspects of effective communication support systems for persons with disabilities even more so than for those non-disabled persons. The trials described herein are universal in design. Some trials carried out to develop these systems are also reported.
Neal LESH Joe MARKS Charles RICH Candace L. SIDNER
In 1960, the famous computer pioneer J.C.R. Licklider described a vision for human-computer interaction that he called "man-computer symbiosis. " Licklider predicted the development of computer software that would allow people "to think in interaction with a computer in the same way that you think with a colleague whose competence supplements your own. " More than 40 years later, one rarely encounters any computer application that comes close to capturing Licklider's notion of human-like communication and collaboration. We echo Licklider by arguing that true symbiotic interaction requires at least the following three elements: a complementary and effective division of labor between human and machine; an explicit representation in the computer of the user's abilities, intentions, and beliefs; and the utilization of nonverbal communication modalities. We illustrate this argument with various research prototypes currently under development at Mitsubishi Electric Research Laboratories (USA).
Kiyoshi HOSHINO Ichiro KAWABUCHI
The purpose of this study is to design a humanoid robotic hand system that is capable of conveying feelings and sensitivities by finger movement for the non-verbal communication between men and robots in the near future. In this paper, studies have been made in four steps. First, a small-sized and light-weight robotic hand was developed to be used as the humanoid according to the concept of extracting required minimum motor functions and implementing them to the robot. Second, basic characteristics of the movement were checked by experiments, simple feedforward control mechanism was designed based on velocity control, and a system capable of tracking joint time-series change command with arbitrary pattern input was realized. Third, tracking performances with regard to sinusoidal input with different frequencies were studied for evaluation of the system thus realized, and space- and time-related accuracy were investigated. Fourth, the sign language motions were generated as examples of information transmission by finger movement. A series of results thus obtained indicated that this robotic hand is capable of transmitting information promptly with comparatively high accuracy through the movement.
We present a method for recognition of continuous Korean Sign Language (KSL). In the paper, we consider the segmentation problem of a continuous hand motion pattern in KSL. For this, we first extract sign sentences by removing linking gestures between sign sentences. We use a gesture tension model and fuzzy partitioning. Then, each sign sentence is disassembled into a set of elementary motions (EMs) according to its geometric pattern. The hidden Markov model is adopted to classify the segmented individual EMs.
Ekkarit MANEENOI Visarut AHKUPUTRA Sudaporn LUKSANEEYANAWIN Somchai JITAPUNKUL
This paper presents a study on acoustic modeling for speech recognition of predominantly monosyllabic languages. Various speech units used in speech recognition systems have been investigated. To evaluate the effectiveness of these acoustic models, the Thai language is selected, since it is a predominantly monosyllabic language and has a complex vowel system. Several experiments have been carried out to find the proper speech unit that can accurately create acoustic model and give a higher recognition rate. Results of recognition rates under different acoustic models are given and compared. In addition, this paper proposes a new speech unit for speech recognition, namely onset-rhyme unit. Two models are proposed-the Phonotactic Onset-Rhyme Model (PORM) and the Contextual Onset-Rhyme Model (CORM). The models comprise a pair of onset and rhyme units, which makes up a syllable. An onset comprises an initial consonant and its transition towards the following vowel. Together with the onset, the rhyme consists of a steady vowel segment and a final consonant. Experimental results show that the onset-rhyme model improves on the efficiency of other speech units. The onset-rhyme model improves on the accuracy of the inter-syllable triphone model by nearly 9.3% and of the context-dependent Initial-Final model by nearly 4.7% for the speaker-dependent systems using only an acoustic model, and 5.6% and 4.5% for the speaker-dependent systems using both acoustic and language model respectively. The results show that the onset-rhyme models attain a high recognition rate. Moreover, they also give more efficiency in terms of system complexity.
Sadaki HIROSE Kunifumi TSUDA Yasuhiro OGOSHI Haruhiko KIMURA
Watson-Crick automata, recently introduced in, are new types of automata in the DNA computing framework, working on tapes which are double stranded sequences of symbols related by a complementarity relation, similar to a DNA molecule. The automata scan separately each of the two strands in a corelated mannar. Some restricted variants of them were also introduced and the relationship between the families of languages recognized by them were investigated in. In this paper, we clarify some relations between the families of languages recognized by the restricted variants of Watson-Crick finite automata and the families in the Chomsky hierarchy.
Giedre SABALIAUSKAITE Shinji KUSUMOTO Katsuro INOUE
For more than twenty-five years software inspections have been considered an effective method for defect detection. Inspections have been investigated through controlled experiments in university environment and industry case studies. However, in most cases software inspections have been used for defect detection in documents of conventional structured development process. Therefore, there is a significant lack of information about how inspections should be applied to Object-Oriented artifacts, such as Object-Oriented code and design diagrams. In addition, extensive work is needed to determine whether some inspection techniques can be more beneficial than others. Most inspection experiments include inspection meetings after individual inspection is completed. However, several researchers suggested that inspection meetings may not be necessary since an insignificant number of new defects are found as a result of inspection meeting. Moreover, inspection meetings have been found to suffer from process loss. This paper presents the findings of a controlled experiment that was conducted to investigate the performance of individual inspectors as well as 3-person teams in Object-Oriented design document inspection. Documents were written using the notation of Unified Modelling Language. Two reading techniques, namely Checklist-based reading (CBR) and Perspective-based reading (PBR), were used during experiment. We found that both techniques are similar with respect to defect detection effectiveness during individual inspection as well as during inspection meetings. Investigating the usefulness of inspection meetings, we found out that the teams that used CBR technique exhibited significantly smaller meeting gains (number of new defect first found during team meeting) than meeting losses (number of defects first identified by an individual but never included into defect list by a team); meanwhile the meeting gains were similar to meeting losses of the teams that used PBR technique. Consequently, CBR 3-person team meetings turned out to be less beneficial than PBR 3-person team meetings.
Keiichi KATAMINE Masanobu UMEDA Isao NAGASAWA Masaaki HASHIMOTO
The modeling of an application domain and its specific knowledge description language are important for developing knowledge-based systems. A rapid-prototyping approach is suitable for such developments since in this approach the modeling and language development are processed simultaneously. However, programming languages and their supporting environments which are usually used for prototyping are not necessarily adequate for developing practical applications. We have been developing an integrated development environment for knowledge-based systems, which supports all the development phases from the early prototyping phase to final commercial development phase. The environment called INSIDE is based on a Prolog abstract machine, and provides all of the functions required for the development of practical applications in addition to the standard Prolog features. This enables the development of both prototypes and practical applications in the same environment. Moreover, their efficient development and maintenance can be achieved. In addition, the effectiveness of INSIDE is described by examples of its practical application.
Chuzo IWAMOTO Tomoka YOKOUCHI Kenichi MORITA Katsunobu IMAI
This paper investigates prefix computations on Iterative Arrays (IAs) with sequential input/output mode. We show that, for any language L accepted by a linear-time IA, there is an IA which, given an infinite string a1a2 ai, generates the values of χL(a1),χL(a1a2),,χL(a1a2 ai), at steps 4,16,,4i2,, respectively. Here, χL:Σ*{0,1} is the characteristic function of the language L
Shorin KYO Takuya KOGA Shin'ichiro OKAZAKI Ichiro KURODA
This paper describes a 51.2 GOPS video recognition processor that provides a cost effective device solution for vision-based intelligent cruise control (ICC) applications. By integrating 128 4-way VLIW (Very Low Instruction Word) processing elements and operating at 100 MHz, the processor achieves to provide a computation power enough for a weather robust lane mark and vehicle detection function written in a high level programming language, to run in video rate, while at the same time it satisfies power efficiency requirements of an in-vehicle LSI. Basing on four basic parallel methods and a software environment including an optimizing compiler of an extended C language and video-based GUI tools, efficient development of real-time video recognition applications that effectively utilize the 128 processing elements are facilitated. Benchmark results show that, this processor can provide a four times better performance compared with a 2.4 GHz general purpose micro-processor.
Thanyapat SAKUNKONCHAK Satoshi KOMATSU Masahiro FUJITA
SpecC language is designated to handle the design of entire system from specification to implementation and of hardware/software co-design. Concurrency is one of the features of SpecC which expresses the parallel execution of processes. Describing the systems which contain concurrent behaviors would have some data exchanging or transferring among them. Therefore, the synchronization semantics (notify/wait) of events should be incorporated. The actual design, which is usually sophisticated by its characteristic and functionalities, may contain a bunch of event synchronization codes. This will make the design difficult and time-consuming to verify. In this paper, we introduce a technique which helps verifying the synchronization of events in SpecC. The original SpecC code containing synchronization semantics is parsed and translated into a Boolean SpecC code. The difference decision diagrams (DDDs) is used to verify for event synchronization on Boolean SpecC code. The counter examples for tracing back to the original source are given when the verification results turn out to be unsatisfied. Here we also introduce idea on automatically refinement when the results are unsatisfied and preset some preliminary results.
Yoshiyuki TAKEDA Kyoji UMEMURA Eiko YAMAMOTO
Determining indexing strings is an important factor in information retrieval. Ideally, the strings should be words that represent documents or queries. Although any single word may be the first candidate for indexing strings for an English corpus, it may not be ideal due to the existence of compound nouns, which are often good indexing strings, and which often depend on the genre of the corpus used. The situation is even worse in Japanese or Chinese where the words are not separated by spaces. In this paper, we propose a method of determining indexing strings based on statistical analysis. The novel features of our method are to make the most of the statistical measure called "adaptation" and not to use language-dependent resources such as dictionaries and stop word lists. In evaluating our method using a Japanese test collection, we found that it actually improves the precision of information retrieval systems.
Zenshiro KAWASAKI Keiji SHIBATA Masato TAJIMA
This paper presents an extension of the database query language SQL to include queries against a database with natural language annotations. The proposed scheme is based on Concept Coupling Model, a language model for handling natural language sentence structures. Integration of the language model with the conventional relational data model provides a unified environment for manipulating information sources comprised of relational tables and natural language texts.
In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.