The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] translation(93hit)

41-60hit(93hit)

  • Estimating Translation Probabilities Considering Semantic Recoverability of Phrase Retranslation

    Hyoung-Gyu LEE  Min-Jeong KIM  YingXiu QUAN  Hae-Chang RIM  So-Young PARK  

     
    LETTER-Natural Language Processing

      Vol:
    E95-D No:3
      Page(s):
    897-901

    The general method for estimating phrase translation probabilities consists of sequential processes: word alignment, phrase pair extraction, and phrase translation probability calculation. However, during this sequential process, errors may propagate from the word alignment step through the translation probability calculation step. In this paper, we propose a new method for estimating phrase translation probabilities that reduce the effects of error propagation. By considering the semantic recoverability of phrase retranslation, our method identifies incorrect phrase pairs that have propagated from alignment errors. Furthermore, we define retranslation similarity which represents the semantic recoverability of phrase retranslation, and use this when computing translation probabilities. Experimental results show that the proposed phrase translation estimation method effectively prevents a PBSMT system from selecting incorrect phrase pairs, and consistently improves the translation quality in various language pairs.

  • An Optimal Algorithm for Searching the Optimal Translation of Query Windows in Quadtree Decomposition

    Hao CHEN  Guangcun LUO  

     
    LETTER-Data Engineering, Web Information Systems

      Vol:
    E94-D No:10
      Page(s):
    2043-2047

    One of the efficient methods to build the index of continuous window queries over moving objects is by means of region quadtree index. In this paper, we present an optimal algorithm to search for the optimal position translation of query windows, where the total number of decomposed quadtree blocks for those windows in quadtree representation is minimal. We exploit the branch-and-bound concept to prune the particular paths of recursions in the search space. Evaluation proves that our optimal algorithm reduces search time greatly and the quadtree index based on optimal position translation works efficiently for continuous window queries. To the best of our knowledge, the algorithms and experiments reported in this paper are novel.

  • A Bayesian Model of Transliteration and Its Human Evaluation When Integrated into a Machine Translation System

    Andrew FINCH  Keiji YASUDA  Hideo OKUMA  Eiichiro SUMITA  Satoshi NAKAMURA  

     
    PAPER

      Vol:
    E94-D No:10
      Page(s):
    1889-1900

    The contribution of this paper is two-fold. Firstly, we conduct a large-scale real-world evaluation of the effectiveness of integrating an automatic transliteration system with a machine translation system. A human evaluation is usually preferable to an automatic evaluation, and in the case of this evaluation especially so, since the common machine translation evaluation methods are affected by the length of the translations they are evaluating, often being biassed towards translations in terms of their length rather than the information they convey. We evaluate our transliteration system on data collected in field experiments conducted all over Japan. Our results conclusively show that using a transliteration system can improve machine translation quality when translating unknown words. Our second contribution is to propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the overfitting problem inherent in maximum likelihood training. We demonstrate the effectiveness of our Bayesian segmentation by using it to build a translation model for a phrase-based statistical machine translation (SMT) system trained to perform transliteration by monotonic transduction from character sequence to character sequence. The Bayesian segmentation was used to construct a phrase-table and we compared the quality of this phrase-table to one generated in the usual manner by the state-of-the-art GIZA++ word alignment process used in combination with phrase extraction heuristics from the MOSES statistical machine translation system, by using both to perform transliteration generation within an identical framework. In our experiments on English-Japanese data from the NEWS2010 transliteration generation shared task, we used our technique to bilingually co-segment the training corpus. We then derived a phrase-table from the segmentation from the sample at the final iteration of the training procedure, and the resulting phrase-table was used to directly substitute for the phrase-table extracted by using GIZA++/MOSES. The phrase-table resulting from our Bayesian segmentation model was approximately 30% smaller than that produced by the SMT system's training procedure, and gave an increase in transliteration quality measured in terms of both word accuracy and F-score.

  • Paraphrase Lattice for Statistical Machine Translation

    Takashi ONISHI  Masao UTIYAMA  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E94-D No:6
      Page(s):
    1299-1305

    Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. In this paper, we show that lattice decoding is also useful for handling input variations. “Input variations” refers to the differences in input texts with the same meaning. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to a lattice decoder. The lattice decoder searches for the best path of the paraphrase lattice and outputs the best translation. Experimental results using the IWSLT dataset and the Europarl dataset show that our proposed method obtains significant gains in BLEU scores.

  • Translation of State Machines from Equational Theories into Rewrite Theories with Tool Support

    Min ZHANG  Kazuhiro OGATA  Masaki NAKAMURA  

     
    PAPER-Specification Translation

      Vol:
    E94-D No:5
      Page(s):
    976-988

    This paper presents a strategy together with tool support for the translation of state machines from equational theories into rewrite theories, aiming at automatically generating rewrite theory specifications. Duplicate effort can be saved on specifying state machines both in equational theories and rewrite theories, when we incorporate the theorem proving facilities of CafeOBJ with the model checking facilities of Maude. Experimental results show that efficiencies of the generated specifications by the proposed strategy are significantly improved, compared with those that are generated by three other existing translation strategies.

  • Integration of Multiple Bilingually-Trained Segmentation Schemes into Statistical Machine Translation

    Michael PAUL  Andrew FINCH  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E94-D No:3
      Page(s):
    690-697

    This paper proposes an unsupervised word segmentation algorithm that identifies word boundaries in continuous source language text in order to improve the translation quality of statistical machine translation (SMT) approaches. The method can be applied to any language pair in which the source language is unsegmented and the target language segmentation is known. In the first step, an iterative bootstrap method is applied to learn multiple segmentation schemes that are consistent with the phrasal segmentations of an SMT system trained on the resegmented bitext. In the second step, multiple segmentation schemes are integrated into a single SMT system by characterizing the source language side and merging identical translation pairs of differently segmented SMT models. Experimental results translating five Asian languages into English revealed that the proposed method of integrating multiple segmentation schemes outperforms SMT models trained on any of the learned word segmentations and performs comparably to available monolingually built segmentation tools.

  • An Empirical Study of FTL Performance in Conjunction with File System Pursuing Data Integrity

    In Hwan DOH  Myoung Sub SHIM  Eunsam KIM  Jongmoo CHOI  Donghee LEE  Sam H. NOH  

     
    LETTER-Software System

      Vol:
    E93-D No:8
      Page(s):
    2302-2305

    Due to the detachability of Flash storage, which is a dominant portable storage, data integrity stored in Flash storages becomes an important issue. This study considers the performance of Flash Translation Layer (FTL) schemes embedded in Flash storages in conjunction with file system behavior that pursue high data integrity. To assure extreme data integrity, file systems synchronously write all file data to storage accompanying hot write references. In this study, we concentrate on the effect of hot write references on Flash storage, and we consider the effect of absorbing the hot write references via nonvolatile write cache on the performance of the FTL schemes in Flash storage. In so doing, we quantify the performance of typical FTL schemes for a realistic digital camera workload that contains hot write references through experiments on a real system environment. Results show that for the workload with hot write references FTL performance does not conform with previously reported studies. We also conclude that the impact of the underlying FTL schemes on the performance of Flash storage is dramatically reduced by absorbing the hot write references via nonvolatile write cache.

  • NVFAT: A FAT-Compatible File System with NVRAM Write Cache for Its Metadata

    In Hwan DOH  Hyo J. LEE  Young Je MOON  Eunsam KIM  Jongmoo CHOI  Donghee LEE  Sam H. NOH  

     
    PAPER-Software Systems

      Vol:
    E93-D No:5
      Page(s):
    1137-1146

    File systems make use of the buffer cache to enhance their performance. Traditionally, part of DRAM, which is volatile memory, is used as the buffer cache. In this paper, we consider the use of of Non-Volatile RAM (NVRAM) as a write cache for metadata of the file system in embedded systems. NVRAM is a state-of-the-art memory that provides characteristics of both non-volatility and random byte addressability. By employing NVRAM as a write cache for dirty metadata, we retain the same integrity of a file system that always synchronously writes its metadata to storage, while at the same time improving file system performance to the level of a file system that always writes asynchronously. To show quantitative results, we developed an embedded board with NVRAM and modify the VFAT file system provided in Linux 2.6.11 to accommodate the NVRAM write cache. We performed a wide range of experiments on this platform for various synthetic and realistic workloads. The results show that substantial reductions in execution time are possible from an application viewpoint. Another consequence of the write cache is its benefits at the FTL layer, leading to improved wear leveling of Flash memory and increased energy savings, which are important measures in embedded systems. From the real numbers obtained through our experiments, we show that wear leveling is improved considerably and also quantify the improvements in terms of energy.

  • Class-Dependent Modeling for Dialog Translation

    Andrew FINCH  Eiichiro SUMITA  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Vol:
    E92-D No:12
      Page(s):
    2469-2477

    This paper presents a technique for class-dependent decoding for statistical machine translation (SMT). The approach differs from previous methods of class-dependent translation in that the class-dependent forms of all models are integrated directly into the decoding process. We employ probabilistic mixture weights between models that can change dynamically on a sentence-by-sentence basis depending on the characteristics of the source sentence. The effectiveness of this approach is demonstrated by evaluating its performance on travel conversation data. We used this approach to tackle the translation of questions and declarative sentences using class-dependent models. To achieve this, our system integrated two sets of models specifically built to deal with sentences that fall into one of two classes of dialog sentence: questions and declarations, with a third set of models built with all of the data to handle the general case. The technique was thoroughly evaluated on data from 16 language pairs using 6 machine translation evaluation metrics. We found the results were corpus-dependent, but in most cases our system was able to improve translation performance, and for some languages the improvements were substantial.

  • A Reordering Model Using a Source-Side Parse-Tree for Statistical Machine Translation

    Kei HASHIMOTO  Hirofumi YAMAMOTO  Hideo OKUMA  Eiichiro SUMITA  Keiichi TOKUDA  

     
    PAPER-Machine Translation

      Vol:
    E92-D No:12
      Page(s):
    2386-2393

    This paper presents a reordering model using a source-side parse-tree for phrase-based statistical machine translation. The proposed model is an extension of IST-ITG (imposing source tree on inversion transduction grammar) constraints. In the proposed method, the target-side word order is obtained by rotating nodes of the source-side parse-tree. We modeled the node rotation, monotone or swap, using word alignments based on a training parallel corpus and source-side parse-trees. The model efficiently suppresses erroneous target word orderings, especially global orderings. Furthermore, the proposed method conducts a probabilistic evaluation of target word reorderings. In English-to-Japanese and English-to-Chinese translation experiments, the proposed method resulted in a 0.49-point improvement (29.31 to 29.80) and a 0.33-point improvement (18.60 to 18.93) in word BLEU-4 compared with IST-ITG constraints, respectively. This indicates the validity of the proposed reordering model.

  • Translation of Untranslatable Words -- Integration of Lexical Approximation and Phrase-Table Extension Techniques into Statistical Machine Translation

    Michael PAUL  Karunesh ARORA  Eiichiro SUMITA  

     
    PAPER-Machine Translation

      Vol:
    E92-D No:12
      Page(s):
    2378-2385

    This paper proposes a method for handling out-of-vocabulary (OOV) words that cannot be translated using conventional phrase-based statistical machine translation (SMT) systems. For a given OOV word, lexical approximation techniques are utilized to identify spelling and inflectional word variants that occur in the training data. All OOV words in the source sentence are then replaced with appropriate word variants found in the training corpus, thus reducing the number of OOV words in the input. Moreover, in order to increase the coverage of such word translations, the SMT translation model is extended by adding new phrase translations for all source language words that do not have a single-word entry in the original phrase-table but only appear in the context of larger phrases. The effectiveness of the proposed methods is investigated for the translation of Hindi to English, Chinese, and Japanese.

  • State-of-the-Art Word Reordering Approaches in Statistical Machine Translation: A Survey

    Marta R. COSTA-JUSSA  Jose A. R. FONOLLOSA  

     
    SURVEY PAPER-Natural Language Processing

      Vol:
    E92-D No:11
      Page(s):
    2179-2185

    This paper surveys several state-of-the-art reordering techniques employed in Statistical Machine Translation systems. Reordering is understood as the word-order redistribution of the translated words. In original SMT systems, this different order is only modeled within the limits of translation units. Relying only in the reordering provided by translation units may not be good enough in most language pairs, which might require longer reorderings. Therefore, additional techniques may be deployed to face the reordering challenge. The Statistical Machine Translation community has been very active recently in developing reordering techniques. This paper gives a brief survey and classification of several well-known reordering approaches.

  • A Technique for Defining Metamodel Translations

    Iván GARCÍA-MAGARIÑO  Rubén FUENTES-FERNÁNDEZ  

     
    PAPER-Fundamentals of Software and Theory of Programs

      Vol:
    E92-D No:10
      Page(s):
    2043-2052

    Model-Driven Engineering and Domain-Specific Modeling Languages are encouraging an increased used of metamodels for the definition of languages and tools. Although the Meta Object Facility language is the standard for metamodeling, there are alternative metamodeling languages that are aimed at satisfying specific requirements. In this context, sharing information throughout different domains and tools requires not only being able to translate models between modeling languages defined with the same metamodeling language, but also between different metamodeling languages. This paper addresses this latter need describing a general technique to define transformations that perform this translation. In this work, two case studies illustrate the application of this process.

  • Imposing Constraints from the Source Tree on ITG Constraints for SMT

    Hirofumi YAMAMOTO  Hideo OKUMA  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E92-D No:9
      Page(s):
    1762-1770

    In the current statistical machine translation (SMT), erroneous word reordering is one of the most serious problems. To resolve this problem, many word-reordering constraint techniques have been proposed. Inversion transduction grammar (ITG) is one of these constraints. In ITG constraints, target-side word order is obtained by rotating nodes of the source-side binary tree. In these node rotations, the source binary tree instance is not considered. Therefore, stronger constraints for word reordering can be obtained by imposing further constraints derived from the source tree on the ITG constraints. For example, for the source word sequence { a b c d }, ITG constraints allow a total of twenty-two target word orderings. However, when the source binary tree instance ((a b) (c d)) is given, our proposed "imposing source tree on ITG" (IST-ITG) constraints allow only eight word orderings. The reduction in the number of word-order permutations by our proposed stronger constraints efficiently suppresses erroneous word orderings. In our experiments with IST-ITG using the NIST MT08 English-to-Chinese translation track's data, the proposed method resulted in a 1.8-points improvement in character BLEU-4 (35.2 to 37.0) and a 6.2% lower CER (74.1 to 67.9%) compared with our baseline condition.

  • Consolidation-Based Speech Translation and Evaluation Approach

    Chiori HORI  Bing ZHAO  Stephan VOGEL  Alex WAIBEL  Hideki KASHIOKA  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Vol:
    E92-D No:3
      Page(s):
    477-488

    The performance of speech translation systems combining automatic speech recognition (ASR) and machine translation (MT) systems is degraded by redundant and irrelevant information caused by speaker disfluency and recognition errors. This paper proposes a new approach to translating speech recognition results through speech consolidation, which removes ASR errors and disfluencies and extracts meaningful phrases. A consolidation approach is spun off from speech summarization by word extraction from ASR 1-best. We extended the consolidation approach for confusion network (CN) and tested the performance using TED speech and confirmed the consolidation results preserved more meaningful phrases in comparison with the original ASR results. We applied the consolidation technique to speech translation. To test the performance of consolidation-based speech translation, Chinese broadcast news (BN) speech in RT04 were recognized, consolidated and then translated. The speech translation results via consolidation cannot be directly compared with gold standards in which all words in speech are translated because consolidation-based translations are partial translations. We would like to propose a new evaluation framework for partial translation by comparing them with the most similar set of words extracted from a word network created by merging gradual summarizations of the gold standard translation. The performance of consolidation-based MT results was evaluated using BLEU. We also propose Information Preservation Accuracy (IPAccy) and Meaning Preservation Accuracy (MPAccy) to evaluate consolidation and consolidation-based MT. We confirmed that consolidation contributed to the performance of speech translation.

  • Training Set Selection for Building Compact and Efficient Language Models

    Keiji YASUDA  Hirofumi YAMAMOTO  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E92-D No:3
      Page(s):
    506-511

    For statistical language model training, target domain matched corpora are required. However, training corpora sometimes include both target domain matched and unmatched sentences. In such a case, training set selection is effective for both reducing model size and improving model performance. In this paper, training set selection method for statistical language model training is described. The method provides two advantages for training a language model. One is its capacity to improve the language model performance, and the other is its capacity to reduce computational loads for the language model. The method has four steps. 1) Sentence clustering is applied to all available corpora. 2) Language models are trained on each cluster. 3) Perplexity on the development set is calculated using the language models. 4) For the final language model training, we use the clusters whose language models yield low perplexities. The experimental results indicate that the language model trained on the data selected by our method gives lower perplexity on an open test set than a language model trained on all available corpora.

  • Name-Based Address Mapping for Virtual Private Networks

    Peter SURANYI  Yasushi SHINJO  Kazuhiko KATO  

     
    PAPER-Internet

      Vol:
    E92-B No:1
      Page(s):
    200-208

    IPv4 private addresses are commonly used in local area networks (LANs). With the increasing popularity of virtual private networks (VPNs), it has become common that a user connects to multiple LANs at the same time. However, private address ranges for LANs frequently overlap. In such cases, existing systems do not allow the user to access the resources on all LANs at the same time. In this paper, we propose name-based address mapping for VPNs, a novel method that allows connecting to hosts through multiple VPNs at the same time, even when the address ranges of the VPNs overlap. In name-based address mapping, rather than using the IP addresses used on the LANs (the real addresses), we assign a unique virtual address to each remote host based on its domain name. The local host uses the virtual addresses to communicate with remote hosts. We have implemented name-based address mapping for layer 3 OpenVPN connections on Linux and measured its performance. The communication overhead of our system is less than 1.5% for throughput and less than 0.2 ms for each name resolution.

  • Introducing a Translation Dictionary into Phrase-Based SMT

    Hideo OKUMA  Hirofumi YAMAMOTO  Eiichiro SUMITA  

     
    PAPER-Natural Language Processing

      Vol:
    E91-D No:7
      Page(s):
    2051-2057

    This paper presents a method to effectively introduce a translation dictionary into phrase-based SMT. Though SMT systems can be built with only a parallel corpus, translation dictionaries are more widely available and have many more entries than parallel corpora. A simple and low-cost method to introduce a translation dictionary is to attach a dictionary entry into a phrase table. This, however, does not work well. Target word order and even whole target sentences are often incorrect. To solve this problem, the proposed method uses high-frequency words in the training corpus. The high-frequency words may already be trained well; in other words, they may appear in the phrase table and therefore be translated with correct word order. Experimental results show the proposed method as far superior to simply attaching dictionary entries into phrase tables.

  • A Specification Translation from Behavioral Specifications to Rewrite Specifications

    Masaki NAKAMURA  Weiqiang KONG  Kazuhiro OGATA  Kokichi FUTATSUGI  

     
    PAPER-Fundamentals of Software and Theory of Programs

      Vol:
    E91-D No:5
      Page(s):
    1492-1503

    There are two ways to describe a state machine as an algebraic specification: a behavioral specification and a rewrite specification. In this study, we propose a translation system from behavioral specifications to rewrite specifications to obtain a verification system which has the strong points of verification techniques for both specifications. Since our translation system is complete with respect to invariant properties, it helps us to obtain a counter-example for an invariant property through automatic exhaustive searching for a rewrite specification.

  • Bilingual Cluster Based Models for Statistical Machine Translation

    Hirofumi YAMAMOTO  Eiichiro SUMITA  

     
    PAPER-Applications

      Vol:
    E91-D No:3
      Page(s):
    588-597

    We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

41-60hit(93hit)