Ryo NAGATA Atsuo KAWAI Koichiro MORIHIRO Naoki ISU
This paper proposes a method for reinforcing noun countability prediction, which plays a crucial role in demarcating correct determiners in machine translation and error detection. The proposed method reinforces countability prediction by introducing a novel heuristics called one countability per discourse. It claims that when a noun appears more than once in a discourse, all instances will share identical countability. The basic idea of the proposed method is that mispredictions can be corrected by efficiently using one countability per discourse heuristics. Experiments show that the proposed method successfully reinforces countability prediction and outperforms other methods used for comparison. In addition to its performance, it has two advantages over earlier methods: (i) it is applicable to any countability prediction method, and (ii) it requires no human intervention to reinforce countability prediction.
Taiji SASAOKA Hideyuki KAWABATA Toshiaki KITAMURA
Parallel programs for distributed memory machines are not easy to create and maintain, especially when they involve sparse matrix computations. In this paper, we propose a program translation system for generating parallel sparse matrix computation codes utilizing PSBLAS. The purpose of the development of the system is to offer the user a convenient way to construct parallel sparse code based on PSBLAS. The system is build up on the idea of bridging the gap between the easy-to-read program representations and highly-tuned parallel executables based on existing parallel sparse matrix computation libraries. The system accepts a MATLAB program with annotations and generates subroutines for an SPMD-style parallel program which runs on distributed-memory machines. Experimental results on parallel machines show that the prototype of our system can generate fairly efficient PSBLAS codes for simple applications such as CG and Bi-CGSTAB programs.
Jong-Hoon OH Key-Sun CHOI Hitoshi ISAHARA
Technical terms are linguistic representations of a domain concept, and their constituents are components used to represent the concept. Technical terms are usually multi-word terms and their meanings can be inferred from their constituents. Therefore, term constituents are essential for understanding the designated meaning of technical terms. However, there are several problems in finding the correct meanings of technical terms with their term constituents. First, because a term constituent is usually a morphological unit rather than a conceptual unit in the case of Korean technical terms, we need to first identify conceptual units by chunking term constituents. Second, conceptual units are sometimes homonyms or synonyms. Moreover their meanings show domain dependency. It is therefore necessary to give information about conceptual units and their possible meanings, including homonyms, synonyms, and domain dependency, so that natural language applications can properly handle technical terms. In this paper, we propose a term constituent alignment algorithm that extracts such information from bilingual technical term pairs. Our algorithm recognizes conceptual units and their meanings by finding English term constituents and their corresponding Korean term constituents for given English-Korean term pairs. Our experimental results indicate that this method can effectively find conceptual units and their meanings with about 6% alignment error rate (AER) on manually analyzed experimental data and about 14% AER on automatically analyzed experimental data.
Sanghwa YUH Kongjoo LEE Jungyun SEO
In this paper, we present a Korean to Chinese/English/Japanese multilingual Machine Translation (MT) system of closed captions for Digital Television (DTV). Preliminary experiments of our closed caption translation with existing base MT systems had shown unsatisfactory result. In order to achieve more accurate translation with the base MT systems, we adopted live resources of multilingual Named Entities and their translingual equivalences from the Web. We also utilize the program information, which the terrestrial broadcasters offer through DTV transport stream, in order to use program specific dictionaries, including the names of characters, locations and organizations. Two more components are adopted for reducing the ambiguities of parsing and word sense disambiguation; sentence simplification for long sentence segmentation and dynamic domain identification for automatic domain dictionary stacking. With these integrated approaches, we could raise the Mean Opinion Score (MOS) of translation accuracy by 0.40 higher than the base MT systems.
This paper proposes a metric for example matching under the example-based machine translation. Our metric served as similarity measure is employed to retrieve the most similar examples to a given query. Basically it makes use of simple information such as lemma and part-of-speech information of typographically mismatched words. In addition, it uses the contiguity information of matched word units to catch the full context. Finally we show the results for the correctness of the proposed metric.
Ki-Young LEE Sang-Kyu PARK Han-Woo KIM
Target word selection is one of the most important and difficult tasks in English-Korean Machine Translation. It effects on the overall translation accuracy of machine translation systems. In this paper, we present a new approach to Korean target word selection for an English noun with translation ambiguities using multiple knowledge such as verb frame patterns, sense vectors based on collocations, statistical Korean local context information and co-occurring POS information. Verb frame patterns constructed with dictionary and corpus play an important role in resolving the sparseness problem of collocation data. Sense vectors are a set of collocation data when an English word having target selection ambiguities is to be translated to specific Korean target word. Statistical Korean Local Context Information is an N-gram information generated using Korean corpus. The co-occurring POS information is a statistically significant POS clue which appears with ambiguous word. To evaluate our approach, we applied the method to Tellus-EK system, English-Korean automatic translation system currently developed at ETRI [1],[2]. The experiment showed promising results for diverse sentences from web documents.
NAT-PT and DSTM are becoming more widespread as de-facto standards for IPv6 dominant network deployment. But few researchers have empirically evaluated their performance aspects. In this paper, we compared the performance of NAT-PT and DSTM with IPv4-only and IPv6-only networks on user applications using metrics such as throughput, CPU utilization, round-trip time, and connect/request/response transaction rate.
Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on either a grapheme-based or phoneme-based method. However, transliteration is an orthographical and phonetic converting process. Therefore, both grapheme and phoneme information should be considered in machine transliteration. In this paper, we propose a grapheme and phoneme-based transliteration model and compare it with previous grapheme-based and phoneme-based models using several machine learning techniques. Our method shows about 1378% performance improvement.
In order to boost the translation quality of corpus-based MT systems for speech translation, the technique of splitting an input utterance appears promising. In previous research, many methods used word-sequence characteristics like N-gram clues among splitting positions. In this paper, to supplement splitting methods based on word-sequence characteristics, we introduce another clue using similarity based on edit-distance. In our splitting method, we generate candidates for utterance splitting based on N-grams, and select the best one by measuring the utterance similarity against a corpus. This selection is founded on the assumption that a corpus-based MT system can correctly translate an utterance that is similar to an utterance in its training corpus. We conducted experiments using three MT systems: two EBMT systems, one of which uses a phrase as a translation unit and the other of which uses an utterance, and an SMT system. The translation results under various conditions were evaluated by objective measures and a subjective measure. The experimental results demonstrate that the proposed method is valuable for the three systems. Using utterance similarity can improve the translation quality.
Keiji YASUDA Fumiaki SUGAYA Toshiyuki TAKEZAWA Genichiro KIKUI Seiichi YAMAMOTO Masuzo YANAGIDA
In this paper we propose an objective method for assessing the capability of a speech translation system. It automates the translation paired comparison method, which gives a simple, easy to understand TOEIC score proposed by Sugaya et al., to succinctly evaluate a speech translation system. To avoid the expensive evaluation cost of the original method where large manual effort is required, the new objective method automates the procedure by employing an objective metric such as BLEU and DP-based measure. The evaluation results obtained by the proposed method are similar to those of the original method. Also, the proposed method is used to evaluate the usefulness of a speech translation system. It is then found that our speech translation system is useful in general, even to users with higher TOEIC score than the system's.
Shu-Min TSAI Jia-Ching WANG Jar-Ferr YANG Jhing-Fa WANG
In this paper, we propose a speech coding translation scheme by transferring coding parameters between GSM half rate and G.729 coders. Compared to the conventional decode-then-encode (DTE) scheme, the proposed parameter conversions provide speech interoperability between mobile and IP networks with reducing computational complexity and coding delay. Simulation results show that the proposed methods can reduce about 30% computational load and coding delay acquired in the target encoders and achieve almost imperceptible degradation in performance.
The virtual memory functions in real-time operating systems have been used in embedded systems. Recent RISC processors provide virtual memory supports through software-managed Translation Lookaside Buffer (TLB) in software. In real-time aspects of the embedded systems, managing TLB entries is the most important because overhead at TLB miss time gives a great effect to overall performance of the system. In this paper, we propose several TLB management algorithms in MIPS processors. In the algorithms, a replaced TLB entry is randomly chosen or managed. We analyze the algorithms by comparing overheads at task switching times and TLB miss times.
Chang-Jae PARK Ando KI In-Cheol PARK Chong-Min KYUNG
This paper describes an automatic interface insertion scheme for in-system verification of algorithm models. To insert the interface, an algorithm model described in C is translated into another source code that includes the communication with hardware components in the target system to be validated with the algorithm model. The communication between the algorithm model and hardware components is achieved using transactors that perform transformation between access operations and bus cycle transactions. I/O terminal is introduced as an interface model to relate the transactions to access operations during the execution of the algorithm model, i.e., accesses to I/O terminals invoke bus cycle transactions in hardware and vice versa. An automatic interface insertion tool is developed using the source-to-source translation to identify the I/O terminals and insert interface function calls in the source code. The proposed automatic interface insertion scheme is validated by emulating several multimedia algorithms written in C on real target systems.
Sin-Jae KANG You-Jin CHUNG Jong-Hyeok LEE
This paper presents a method for disambiguating word senses in Korean-Japanese machine translation by using a language independent ontology. This ontology stores semantic constraints between concepts and other world knowledge, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In order to acquire a language-independent and reasonably practical ontology in a limited time and with less manpower, we extend the existing Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built electronic dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from a corpus. In practical machine translation systems, our word sense disambiguation method achieved an improvement of average precision by 6.0% for Japanese analysis and by 9.2% for Korean analysis over the method without using an ontology.
Katsunori YAMASAKI Yoshichika SODESHIMA
In this paper we introduce a bottom-up pushdown tree transducer (b-PDTT) which is a bottom-up tree transducer with pushdown storage (where the pushdown storage stores the trees) and may be considered as a dual concept of the top-down pushdown tree transducer (t-PDTT). After proving some fundamental properties of b-PDTT, for example, any b-PDTT can be realized by a linear stack with single state and converted into G-type normal form which corresponds to Greibach normal form in a context-free grammar, and so on, we compare the translational capability of a b-PDTT with that of a t-PDTT.
Atsushi NAKAMURA Masaki NAITO Hajime TSUKADA Rainer GRUHN Eiichiro SUMITA Hideki KASHIOKA Hideharu NAKAJIMA Tohru SHIMIZU Yoshinori SAGISAKA
This paper describes an application of a speech translation system to another task/domain in the real-world by using developmental data collected from real-world interactions. The total cost for this task-alteration was calculated to be 9 Person-Month. The newly applied system was also evaluated by using speech data collected from real-world interactions. For real-world speech having a machine-friendly speaking style, the newly applied system could recognize typical sentences with a word accuracy of 90% or better. We also found that, concerning the overall speech translation performance, the system could translate about 80% of the input Japanese speech into acceptable English sentences.
In this paper, some properties of domain tree languages of top-down pushdown tree transducers (domain(t-PDTT) or t-PDTTD) are shown. It is shown that (1) for any L1, L2 in context-free language (CFL), L1L2yielde(t-PDTTD) (where yielde is an extended yield), (2) yielde(t-PDTTε0DF) is closed under homomorphisms, where t-PDTTε0 is a t-PDTT which can not proceed generations after reading a constant symbol σ and t-PDTTε0DF denotes a domain tree language of t-PDTTε0 with a final state translation, and (3) yielde(t-PDTTε0DF) is the class of recursively enumerable languages, and consequently yielde(t-PDTTD) is the class of recursively enumerable languages.
Sang-Woon KIM Ji-Young OH Shin TANAHASHI Yoshinao AOKI
In order to investigate the possibility of avatar communication using sign-language, in this paper, we develop a sign-language chatting system on the Internet using CG aniamtion techniques between Korea and Japan. We construct the system in server-client architecture, where images of Korean or Japanese sign-language are analyzed into a series of parameters for sign-language animation by server. We transmit the parameters, which are text data instead of images or their compression, to clients and regenerate the corresponding CG animation using the received data. The chatting system is implemented with Visual C++ 5.0 on Windows platforms. Experimental results show that the sign-language could be used as a communication means between avatars of different languages.
Shigeki MATSUBARA Yasuyoshi INAGAKI
Since spontaneously spoken language expressions appear continuously, the transfer stage of a spoken language machine translation system have to work incrementally. In such the system, the high degree of incrementality is also strongly required rather than that of quality. This paper proposes an incremental machine translation system, which translates English spoken words into Japanese in accordance with the order of appearances of them. The system is composed of three modules: incremental parsing, transfer and generation, which work synchronously. The transfer module utilizes some features and phenomena characterizing Japanese spoken language: flexible wordorder, ellipses, repetitions and so forth. This in influenced by the observational facts that such characteristics frequently appear in Japanese uttered by English-Japanese interpreters. Their frequent utilization is the key to success of the exceedingly incremental translation between English and Japanese, which have different word-order. We have implemented a prototype system Sync/Trans, which parses English dialogues incrementally and generates Japanese immediately. To evaluate Sync/Trans we fave made an experiment with the conversations consisting of 27 dialogues and 218 sentences. 190 of the sentences are correct, providing a success rate of 87.2%. This result shows our incremental method to be a promising technique for spoken language translation with acceptable accuracy and high real-time nature.
Edoardo CHARBON Enrico MALAVASI Paolo MILIOZZI Alberto SANGIOVANNI-VINCENTELLI
In this paper we propose a comprehensive approach to physical design based on the constraint paradigm. Bounds on the most critical circuit parasitics are automatically generated to help designers and/or physical design tools meet a set of high-level specifications. The constraint generation engine is based on constrained optimization, where various parasitic effects on interconnect and devices are accounted for and dealt with in different manners according to their statistical behavior and their effect on performance.