The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] disambiguation(14hit)

1-14hit
  • Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags

    Van-Hai VU  Quang-Phuoc NGUYEN  Kiem-Hieu NGUYEN  Joon-Choul SHIN  Cheol-Young OCK  

     
    PAPER-Natural Language Processing

      Pubricized:
    2020/01/15
      Vol:
    E103-D No:4
      Page(s):
    866-873

    Since deep learning was introduced, a series of achievements has been published in the field of automatic machine translation (MT). However, Korean-Vietnamese MT systems face many challenges because of a lack of data, multiple meanings of individual words, and grammatical diversity that depends on context. Therefore, the quality of Korean-Vietnamese MT systems is still sub-optimal. This paper discusses a method for applying Named Entity Recognition (NER) and Part-of-Speech (POS) tagging to Vietnamese sentences to improve the performance of Korean-Vietnamese MT systems. In terms of implementation, we used a tool to tag NER and POS in Vietnamese sentences. In addition, we had access to a Korean-Vietnamese parallel corpus with more than 450K paired sentences from our previous research paper. The experimental results indicate that tagging NER and POS in Vietnamese sentences can improve the quality of Korean-Vietnamese Neural MT (NMT) in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score. On average, our MT system improved by 1.21 BLEU points or 2.33 TER scores after applying both NER and POS tagging to the Vietnamese corpus. Due to the structural features of language, the MT systems in the Korean to Vietnamese direction always give better BLEU and TER results than translation machines in the reverse direction.

  • Personal Data Retrieval and Disambiguation in Web Person Search

    Yuliang WEI  Guodong XIN  Wei WANG  Fang LV  Bailing WANG  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2018/10/24
      Vol:
    E102-D No:2
      Page(s):
    392-395

    Web person search often return web pages related to several distinct namesakes. This paper proposes a new web page model for template-free person data extraction, and uses Dirichlet Process Mixture model to solve name disambiguation. The results show that our method works best on web pages with complex structure.

  • An Empirical Study of Classifier Combination Based Word Sense Disambiguation

    Wenpeng LU  Hao WU  Ping JIAN  Yonggang HUANG  Heyan HUANG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2017/08/23
      Vol:
    E101-D No:1
      Page(s):
    225-233

    Word sense disambiguation (WSD) is to identify the right sense of ambiguous words via mining their context information. Previous studies show that classifier combination is an effective approach to enhance the performance of WSD. In this paper, we systematically review state-of-the-art methods for classifier combination based WSD, including probability-based and voting-based approaches. Furthermore, a new classifier combination based WSD, namely the probability weighted voting method with dynamic self-adaptation, is proposed in this paper. Compared with existing approaches, the new method can take into consideration both the differences of classifiers and ambiguous instances. Exhaustive experiments are performed on a real-world dataset, the results show the superiority of our method over state-of-the-art methods.

  • Feature Ensemble Network with Occlusion Disambiguation for Accurate Patch-Based Stereo Matching

    Xiaoqing YE  Jiamao LI  Han WANG  Xiaolin ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2017/09/14
      Vol:
    E100-D No:12
      Page(s):
    3077-3080

    Accurate stereo matching remains a challenging problem in case of weakly-textured areas, discontinuities and occlusions. In this letter, a novel stereo matching method, consisting of leveraging feature ensemble network to compute matching cost, error detection network to predict outliers and priority-based occlusion disambiguation for refinement, is presented. Experiments on the Middlebury benchmark demonstrate that the proposed method yields competitive results against the state-of-the-art algorithms.

  • Topic Representation of Researchers' Interests in a Large-Scale Academic Database and Its Application to Author Disambiguation

    Marie KATSURAI  Ikki OHMUKAI  Hideaki TAKEDA  

     
    PAPER

      Pubricized:
    2016/01/14
      Vol:
    E99-D No:4
      Page(s):
    1010-1018

    It is crucial to promote interdisciplinary research and recommend collaborators from different research fields via academic database analysis. This paper addresses a problem to characterize researchers' interests with a set of diverse research topics found in a large-scale academic database. Specifically, we first use latent Dirichlet allocation to extract topics as distributions over words from a training dataset. Then, we convert the textual features of a researcher's publications to topic vectors, and calculate the centroid of these vectors to summarize the researcher's interest as a single vector. In experiments conducted on CiNii Articles, which is the largest academic database in Japan, we show that the extracted topics reflect the diversity of the research fields in the database. The experiment results also indicate the applicability of the proposed topic representation to the author disambiguation problem.

  • Enriching Semantic Knowledge for WSD

    Junpeng CHEN  Wei YU  

     
    LETTER-Natural Language Processing

      Vol:
    E97-D No:8
      Page(s):
    2212-2216

    In our previous work, we proposed to combine ConceptNet and WordNet for Word Sense Disambiguation (WSD). The ConceptNet was automatically disambiguated through Normalized Google Distance (NGD) similarity. In this letter, we present several techniques to enhance the performance of the ConceptNet disambiguation and use this enriched semantic knowledge in WSD task. We propose to enrich both the WordNet semantic knowledge and NGD to disambiguate the concepts in ConceptNet. Furthermore, we apply the enriched semantic knowledge to improve the performance of WSD. From a number of experiments, the proposed method has been obtained enhanced results.

  • A Method for English-Korean Target Word Selection Using Multiple Knowledge Sources

    Ki-Young LEE  Sang-Kyu PARK  Han-Woo KIM  

     
    PAPER

      Vol:
    E89-A No:6
      Page(s):
    1622-1629

    Target word selection is one of the most important and difficult tasks in English-Korean Machine Translation. It effects on the overall translation accuracy of machine translation systems. In this paper, we present a new approach to Korean target word selection for an English noun with translation ambiguities using multiple knowledge such as verb frame patterns, sense vectors based on collocations, statistical Korean local context information and co-occurring POS information. Verb frame patterns constructed with dictionary and corpus play an important role in resolving the sparseness problem of collocation data. Sense vectors are a set of collocation data when an English word having target selection ambiguities is to be translated to specific Korean target word. Statistical Korean Local Context Information is an N-gram information generated using Korean corpus. The co-occurring POS information is a statistically significant POS clue which appears with ambiguous word. To evaluate our approach, we applied the method to Tellus-EK system, English-Korean automatic translation system currently developed at ETRI [1],[2]. The experiment showed promising results for diverse sentences from web documents.

  • Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

    Hiroyuki KAJI  Yasutsugu MORIMOTO  

     
    PAPER-Natural Language Processing

      Vol:
    E88-D No:2
      Page(s):
    289-301

    An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.

  • A Probabilistic Feature-Based Parsing Model for Head-Final Languages

    So-Young PARK  Yong-Jae KWAK  Joon-Ho LIM  Hae-Chang RIM  

     
    LETTER-Natural Language Processing

      Vol:
    E87-D No:12
      Page(s):
    2893-2897

    In this paper, we propose a probabilistic feature-based parsing model for head-final languages, which can lead to an improvement of syntactic disambiguation while reducing the parsing cost related to lexical information. For effective syntactic disambiguation, the proposed parsing model utilizes several useful features such as a syntactic label feature, a content feature, a functional feature, and a size feature. Moreover, it is designed to be suitable for representing word order variation of non-head words in head-final languages. Experimental results show that the proposed parsing model performs better than previous lexicalized parsing models, although it has much less dependence on lexical information.

  • Decision Tree Based Disambiguation of Semantic Roles for Korean Adverbial Postpositions

    Seong-Bae PARK  

     
    LETTER-Natural Language Processing

      Vol:
    E86-D No:8
      Page(s):
    1459-1463

    The case postpositions usually have more than one semantic role in Korean. The adverbial postpositions among various postpositions especially make the development of Korean-based machine translation system difficult, because they have more semantic roles than others. In this paper, we describe a new method for resolving semantic ambiguities of adverbial postpositions using decision tree induction. The lack of training examples in decision tree induction is overcome by clustering words into classes using a kind of greedy algorithm. The cross validation results show that the presented method achieves 76.5% of accuracy on the average, which is 20.3% improvement over the baseline method.

  • Disambiguating Word Senses in Korean-Japanese Machine Translation by Using Semi-Automatically Constructed Ontology

    Sin-Jae KANG  You-Jin CHUNG  Jong-Hyeok LEE  

     
    PAPER-Natural Language Processing

      Vol:
    E85-D No:10
      Page(s):
    1688-1697

    This paper presents a method for disambiguating word senses in Korean-Japanese machine translation by using a language independent ontology. This ontology stores semantic constraints between concepts and other world knowledge, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In order to acquire a language-independent and reasonably practical ontology in a limited time and with less manpower, we extend the existing Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built electronic dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from a corpus. In practical machine translation systems, our word sense disambiguation method achieved an improvement of average precision by 6.0% for Japanese analysis and by 9.2% for Korean analysis over the method without using an ontology.

  • A Method of Case Structure Analysis for Japanese Sentences Based on Examples in Case Frame Dictionary

    Sadao KUROHASHI  Makoto NAGAO  

     
    PAPER

      Vol:
    E77-D No:2
      Page(s):
    227-239

    A case structure expression is one of the most important forms to represent the meaning of the sentence. Case structure analysis is usually performed by consulting case frame information in a verb dictionary. However, this analysis is very difficult because of several problems, such as word sense ambiguity and structural ambiguity. A conventional method for solving these problems is to use the method of selectional restriction, but this method has a drawback in the semantic marker (SM) method --the trade-off between descriptive power and construction cost. In this paper, we propose a method of case structure analysis based on examples in case frame dictionary This method uses the case frame dictionary which has some typical example sentences for each case frame, and it selects a proper case frame for an input sentence by matching the input sentence with the examples in the case frame dictionary. The best matching score, which is utilized for selecting a proper case frame for a predicate, can be considered as the score for the case structure of the predicate. Therefore, when there are two or more readings for a sentence because of structural ambiguity, the best reading of a sentence can be selected by evaluating the sum of the scores for the case structures of all predicates in a sentence. We report on experiments which shows that this method is superior to the conventional, coarse-grained SM method, and also describe the superiority of the example-based method over the SM method.

  • A Preferential Constraint Satisfaction Technique for Natural Language Analysis

    Katashi NAGAO  

     
    PAPER

      Vol:
    E77-D No:2
      Page(s):
    161-170

    In this paper, we present a new technique for the semantic analysis of sentences, including an ambiguity-packing method that generates a packed representation of individual syntactic and semantic structures. This representation is based on a dependency structure with constraints that must be satisfied in the syntax-semantics mapping phase. Complete syntax-semantics mapping is not performed until all ambiguities have been resolved, thus avoiding the combinatorial explosions that sometimes occur when unpacking locally packed ambiguities. A constraint satisfaction technique makes it possible to resolve ambiguities efficiently without unpacking. Disambiguation is the process of applying syntactic and semantic constraints to the possible candidate solutions (such as modifiees, cases, and wordsenses) and removing unsatisfactory condidates. Since several candidates often remain after applying constraints, another kind of knowledge to enable selection of the most plausible candidate solution is required. We call this new knowledge a preference. Both constraints and preferences must be applied to coordination for disambiguation. Either of them alone is insufficient for the purpose, and the interactions between them are important. We also present an algorithm for controlling the interaction between the constraints and the preferences in the disambiguation process. By allowing the preferences to control the application of the constraints, ambiguities can be efficiently resolved, thus avoiding combinatorial explosions.

  • Example-Based Word-Sense Disambiguation

    Naohiko URAMOTO  

     
    PAPER

      Vol:
    E77-D No:2
      Page(s):
    240-246

    This paper presents a new method for resolving lexical (word sense) ambiguities inherent in natural language sentences. The Sentence Analyzer (SENA) was developed to resolve such ambiguities by using constraints and example-based preferences. The ambiguities are packed into a single dependency structure, and grammatical and lexical constraints are applied to it in order to reduce the degree of ambiguity. The application of constraints is realized by a very effective constraint-satisfaction technique. Remaining ambiguities are resolved by the use of preferences calculated from an example-base, which is a set of fully parsed word-to-word dependencies acquired semi-automatically from on-line dictionaries.