In the last three decades of the 20th Century, research in speech recognition has been intensively carried out worldwide, spurred on by advances in signal processing, algorithms, architectures, and hardware. Recognition systems have been developed for a wide variety of applications, ranging from small vocabulary keyword recognition over dial-up telephone lines, to medium size vocabulary voice interactive command and control systems for business automation, to large vocabulary speech dictation, spontaneous speech understanding, and limited-domain speech translation. Although we have witnessed many new technological promises, we have also encountered a number of practical limitations that hinder a widespread deployment of applications and services. On one hand, fast progress was observed in statistical speech and language modeling. On the other hand only spotty successes have been reported in applying knowledge sources in acoustics, speech and language science to improving speech recognition performance and robustness to adverse conditions. In this paper we review some key advances in several areas of speech recognition. A bottom-up detection framework is also proposed to facilitate worldwide research collaboration for incorporating technology advances in both statistical modeling and knowledge integration into going beyond the current speech recognition limitations and benefiting the society in the 21st century.
Hanmin JUNG Gary Geunbae LEE Won Seug CHOI KyungKoo MIN Jungyun SEO
This paper describes a highly-portable multilingual question answering system on multiple relational databases. We apply techniques which were verified on open-domain text-based question answering, such as semantic category and pattern-based grammars, into natural language interfaces to relational databases. Lexico-semantic pattern (LSP) and multi-level grammars achieve portability of languages, domains, and DB management systems. The LSP-based linguistic processing does not require deep analysis that sacrifices robustness and flexibility, but can handle delicate natural language questions. To maximize portability, we drive three dependency factors into the following two parts: language-dependent part into front linguistic analysis, and domain-dependent and database-dependent parts into backend SQL query generation. We also support session-based dialog by preserving SQL queries created from previous user's question, and then re-generating new SQL query for the successive questions. Experiments with 779 queries generate only constraint-missing errors, which can be easily corrected by adding new terms, of 2.25% for English and 5.67% for Korean.
Shigeru YOSHIDA Takashi MORIHARA Hironori YAHAGI Noriko ITANI
16-bit Asian language codes can not be compressed well by conventional 8-bit sampling text compression schemes. Previously, we reported the application of a word-based text compression method that uses 16-bit sampling for the compression of Japanese texts. This paper describes our further efforts in applying a word-based method with a static canonical Huffman encoder to both Japanese and Chinese texts. The method was proposed to support a multilingual environment, as we replaced the word-dictionary and the canonical Huffman code table for the respective language appropriately. A computer simulation showed that this method is effective for both languages. The obtained compression ratio was a little less than 0.5 without regarding the Markov context, and around 0.4 when accounting for the first order Markov context.
Sin-Jae KANG You-Jin CHUNG Jong-Hyeok LEE
This paper presents a method for disambiguating word senses in Korean-Japanese machine translation by using a language independent ontology. This ontology stores semantic constraints between concepts and other world knowledge, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In order to acquire a language-independent and reasonably practical ontology in a limited time and with less manpower, we extend the existing Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built electronic dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from a corpus. In practical machine translation systems, our word sense disambiguation method achieved an improvement of average precision by 6.0% for Japanese analysis and by 9.2% for Korean analysis over the method without using an ontology.
Kazuyuki TAKAGI Rei OGURO Kazuhiko OZEKI
Experiments were conducted to examine an approach from language modeling side to improving noisy speech recognition performance. By adopting appropriate word strings as new units of processing, speech recognition performance was improved by acoustic effects as well as by test-set perplexity reduction. Three kinds of word string language models were evaluated, whose additional lexical entries were selected based on combinations of part of speech information, word length, occurrence frequency, and log likelihood ratio of the hypotheses about the bigram frequency. All of the three word string models reduced errors in broadcast news speech recognition, and also lowered test-set perplexity. The word string model based on log likelihood ratio exhibited the best improvement for noisy speech recognition, by which deletion errors were reduced by 26%, substitution errors by 9.3%, and insertion errors by 13%, in the experiments using the speaker-dependent, noise-adapted triphone. Effectiveness of word string models on error reduction was more prominent for noisy speech than for studio-clean speech.
Tiansheng XU Zenshiro KAWASAKI Keiji TAKIDA Zheng TANG
This paper presents a child verb learning model mainly based on syntactic bootstrapping. The model automatically learns 4-5-year-old children's linguistic knowledge of verbs, including subcategorization frames and thematic roles, using a text in dialogue format. Subcategorization frame acquisition of verbs is guided by the assumption of the existence of nine verb prototypes. These verb prototypes are extracted based on syntactic bootstrapping and some psycholinguistic studies. Thematic roles are assigned by syntactic bootstrapping and other psycholinguistic hypotheses. The experiments are performed on the data from the CHILDES database. The results show that the learning model successfully acquires linguistic knowledge of verbs and also suggest that psycholinguistic studies of child verb learning may provide important hints for linguistic knowledge acquisition in natural language processing (NLP).
Kazuhisa OKADA Akihisa YAMADA Takashi KAMBE
The Bach compiler is a behavioral synthesis tool, which synthesizes RT-level circuits from behavioral descriptions written in the Bach C language. It shortens the design period of LSI and helps designers concentrate on algorithm design and refinement. In this paper, we propose methods for optimizing the area and performance of algorithms described in Bach C. In our experiments, we optimized a Viterbi decoder algorithm using our proposed methods and synthesized the circuit using the Bach compiler. The conclusion is that the circuit produced using Bach is both smaller and faster than the hand-coded register transfer level (RTL) design. This proves that the Bach compiler produces high-quality results and the Bach C language is effective for describing the behavior of hardware at a high-level.
Verb phrases are sometimes omitted in natural language (ellipsis). It is necessary to resolve the verb phrase ellipses in language understanding, machine translation, and dialogue processing. This paper describes a practical way to resolve verb phrase ellipses by using surface expressions and examples. To make heuristic rules for ellipsis resolution we classified verb phrase ellipses by checking whether the referent of a verb phrase ellipsis appears in the surrounding sentences or not. We experimented with the resolution of verb phrase elipses on a novel and obtained a recall rate of 73% and a precision rate of 66% on test sentences. In the case when the referent of a verb phrase ellipsis appeared in the surrounding sentences, the accuracy rate was high. But, in the case when the referent of a verb phrase ellipsis did not appear in the surrounding sentences, the accuracy rate was not so high. Since the analysis of this phenomena is very difficult, it is valuable to propose a way of solving the problem to a certain extent. When the size of corpus becomes larger and the machine performance becomes greater, the method of using corpus will become effective.
Pisana PLACIDI Leonardo VERDUCCI Guido MATRELLA Luca ROSELLI Paolo CIAMPOLINI
In this paper, characteristics of a digital system dedicated to the fast execution of the FDTD algorithm, widely used for electromagnetic simulation, are presented. Such system is conceived as a module communicating with a host personal computer via a PCI bus, and is based on a VLSI ASIC, which implements the "field-update" engine. The system structure is defined by means of a hardware description language, allowing to keep high-level system specification independent of the actual fabrication technology. A virtual implementation of the system has been carried out, by mapping such description in a standard-cell style on a commercial 0.35 µm technology. Simulations show that significant speed-up can be achieved, with respect to state-of-the-art software implementations of the same algorithm.
In this paper, we describe the recent trend in automatic speech recognition. First, we should point out that the current art of speech recognition by machines is admittedly inferior to the ability of human beings. In particular, we assert that the improvement of acoustic models is necessary. Second, we describe robust feature parameters for noisy environments, which are important in practical usage. Then, we indicate that much training data in the same environment as the recognition stage are useful from the viewpoints of information theory and pattern recognition. Third, we discuss acoustic models and language models which are central issues in speech recognition techniques. Then the principle and limitations of the hidden Markov model (HMM) and recent extended models are discussed. The role of language models is to eliminate improbable candidate words, that is, to reduce the search space. In other words, language models having smaller entropy are preferable. From this standpoint, we survey stochastic language models. Finally, we state some points which deserve attention when constructing speech recognition systems.
Masaki NAKANISHI Takao INDOH Kiyoharu HAMAGUCHI Toshinobu KASHIWABARA
The class NQP was proposed as the class of problems that are solvable by non-deterministic quantum Turing machines in polynomial time. In this paper, we introduce non-deterministic quantum finite automata in which the same non-determinism as in non-deterministic quantum Turing machines is applied. We compare non-deterministic quantum finite automata with the classical counterparts, and show that (unlike the case of classical finite automata) the class of languages recognizable by non-deterministic quantum finite automata properly contains the class of all regular languages.
Tomoya MORINAGA Go OGOSE Tadashi OHTA
This paper proposes an Active Networks architecture for VoIP gateway. In the proposed architecture, instead of procedural language, declarative language is used to describe the up-loaded program. This allows for a reduction in size of the up-loaded program, and increases the flexibility for describing the up-loaded program. VoIP gateway can then provide highly flexible services. An experimental system was implemented for feasibility studies. Specification of the declarative language used for describing the up-loaded program, basic functionalities of an interpreter for the language, and execution control program which executes components programs stored the node beforehand were confirmed.
Decentralized XML databases are often used in Electronic Commerce (EC) business models such as e-brokers on the Web. To flexibly model such applications, we need a modeling language for EC business processes. To this end, we have adopted a query language approach and have designed a query language, called XBML, for decentralized XML databases used in EC businesses. In this paper, we explain and validate the functionality of XBML by specifying e-broker business models and describe the implementation of the XBML server, focusing on the distributed query processing.
Chin-Hwa KUO David WIBLE Nai-Lung TSAO
The design and implementation of a novel English writing environment is described. The system integrates modern computer and networking technologies with analytical tools from linguistics and language pedagogy to construct an advanced English writing environment. The system is not only suitable for students in learning English, but also of benefit to teachers in making comments and detecting learners' common difficulties. Furthermore, the collected essays from students and comments from teachers constitute a useful learner corpus. This is also of benefit to researchers in analyzing learners' persistent errors. In order to allow global access from the Internet, the system is web-based. Users, for example, students, teachers, and researchers, may access the system through web browsers. The system was developed in a cooperative effort of Computers And Networking (CAN) laboratory and the Research in English Acquisition and Pedagogy (REAP) Group at Tamkang University. The system has been piloted by six English faculty members at Tamkang University and is currently being used in five high schools in Taiwan. The learner corpus currently consists of over 800,000 word tokens of learners' writing.
Liang CHEN Naoyuki TOKUDA Akira NAGAI
To improve the unstable performance of the traditional keyword-based search engine due to ambiguities of a natural language such as synonymy and /or polysemy, we have developed a new advanced DLSI (differential latent semantic index) space based probabilistic information retrieval system. The new method exploits a most likelihood posteriori function providing a measure of reliability in retrieving a document in the database having a closest match with another document of a query. Our simple experiment gives a supporting evidence for the validity of the theory, which is capable of capturing the intricate variability in word usage contributing to a more robust context contingent search engine.
Sangkyung KIM Kyungsup SUN Sunshin AN
This paper describes distributed communications based on a new paradigm of middleware platform called the typed bus (TB) platform. While traditional middleware platforms provide the same type of communication paths between distributed objects, the TB platform provides typed buses, which are communication paths differentiated according to application's communication characteristics. Each typed bus represents unique communication type and controls communications between distributed objects according to its type as a hardware system bus constrains communication between hardware components. Distributed communications are achieved via typed buses, which check if the communications are compliant with their types. In this paper we propose the architecture of the TB platform, introduce TB Type Definition Language used to specify a typed bus's type, and describe an implementation of our platform.
Marcus Vinicius LAMAR Md. Shoaib BHUIYAN Akira IWATA
This paper presents a new neural network structure, called Temporal-CombNET (T-CombNET), dedicated to the time series analysis and classification. It has been developed from a large scale Neural Network structure, CombNET-II, which is designed to deal with a very large vocabulary, such as Japanese character recognition. Our specific modifications of the original CombNET-II model allow it to do temporal analysis, and to be used in large set of human movements recognition system. In T-CombNET structure one of most important parameter to be set is the space division criterion. In this paper we analyze some practical approaches and present an Interclass Distance Measurement based criterion. The T-CombNET performance is analyzed applying to in a practical problem, Japanese Kana finger spelling recognition. The obtained results show a superior recognition rate when compared to different neural network structures, such as Multi-Layer Perceptron, Learning Vector Quantization, Elman and Jordan Partially Recurrent Neural Networks, CombNET-II, k-NN, and the proposed T-CombNET structure.
Masami SHISHIBORI Kazuaki ANDO Yuuichirou KASHIWAGI Jun-ichi AOE
Natural language interface systems can accept more unrestricted queries from users than other systems, however it is impossible to understand erroneous sentences which include the syntax errors, unknown words and misspelling. In order to realize the superior natural language interface, the automatic error correction for erroneous sentences is one of problems to be solved. The method to apply the LR parsing strategies is one of the famous approaches as the robust error recovery scheme. This method is able to obtain a high correction accuracy, however it takes a great deal of time to parse the sentence, such that it becomes a very important task to improve the time-cost. In this paper, we propose the method to improve the time efficiency, keeping the correction accuracy of the traditional method. This method makes use of a new parsing table that denotes the states to be transited after accepting each symbol. By using this table, the symbol which is allocated just after the error position can be utilized for selecting correction symbols, as a result, the number of candidates produced on the correction process is reduced, and fast system can be realized. The experiment results, using 1,050 sentences including error characters, show that this method can correct error points 69 times faster than the traditional method, also keep the same correction accuracy as the traditional method.
Fattaneh TAGHIYAREH Hiroshi NAGAHASHI
A number of parallel algorithms have been developed to solve large-scale real world problems. Although there has been much work on the design of parallel algorithms, there has been little on the design of languages for expressing these algorithms. This paper describes the BPL, a new parallel language designed for butterfly networks. The purpose of this language is to help designers in hiding the complexity of the algorithm and leaving details of mapping between data and processors for lower level. BPL provides a simpler virtual machine for the designer , in order to avoid thinking about control of processors and data. From another point of view, BPL helps designer to logically check the algorithm and correct any possible error in it. The paper gives some examples implemented by this language. In addition, we have also implemented a software tool which simulates the running of the algorithm on the network. The results lead us to believe that this language would be useful in representing all kinds of algorithms on this network including normal algorithms and others.
Sang-Woon KIM Jong-Woo LEE Yoshinao AOKI
The sign-language can be used as a communication means between avatars having no common language. As a trial to overcome the linguistic barrier, we have previously developed a 2D model-based sign-language chatting system between Korean and Japanese on the the Internet. In that system, there have been some problems to be solved for natural animation and real-time transmission. In this paper, we employ a 3D character model for stereoscopic gestures in the sign-language animation. We also utilize CG animation techniques which use the variable number of frames and a cubic spline interpolation in order to generate realistic gestures. For real-time communication, on the other hand, we make use of an intelligent communication method on a client-server architecture. We implement a preliminary communication system with Visual C++ 5.0 and Open Inventor on Windows platforms. Experimental results show a possibility that the system could be used for avatar communications between different languages.