IEICE global.ieice.org Site

Keyword Search Result

[Keyword] disambiguation(14hit)

1-14hit

Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags
Van-Hai VU Quang-Phuoc NGUYEN Kiem-Hieu NGUYEN Joon-Choul SHIN Cheol-Young OCK

PAPER-Natural Language Processing

Pubricized:
2020/01/15
Vol:
E103-D No:4
Page(s):
866-873
Since deep learning was introduced, a series of achievements has been published in the field of automatic machine translation (MT). However, Korean-Vietnamese MT systems face many challenges because of a lack of data, multiple meanings of individual words, and grammatical diversity that depends on context. Therefore, the quality of Korean-Vietnamese MT systems is still sub-optimal. This paper discusses a method for applying Named Entity Recognition (NER) and Part-of-Speech (POS) tagging to Vietnamese sentences to improve the performance of Korean-Vietnamese MT systems. In terms of implementation, we used a tool to tag NER and POS in Vietnamese sentences. In addition, we had access to a Korean-Vietnamese parallel corpus with more than 450K paired sentences from our previous research paper. The experimental results indicate that tagging NER and POS in Vietnamese sentences can improve the quality of Korean-Vietnamese Neural MT (NMT) in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score. On average, our MT system improved by 1.21 BLEU points or 2.33 TER scores after applying both NER and POS tagging to the Vietnamese corpus. Due to the structural features of language, the MT systems in the Korean to Vietnamese direction always give better BLEU and TER results than translation machines in the reverse direction.
Personal Data Retrieval and Disambiguation in Web Person Search
Yuliang WEI Guodong XIN Wei WANG Fang LV Bailing WANG

LETTER-Data Engineering, Web Information Systems

Pubricized:
2018/10/24
Vol:
E102-D No:2
Page(s):
392-395
Web person search often return web pages related to several distinct namesakes. This paper proposes a new web page model for template-free person data extraction, and uses Dirichlet Process Mixture model to solve name disambiguation. The results show that our method works best on web pages with complex structure.
An Empirical Study of Classifier Combination Based Word Sense Disambiguation
Wenpeng LU Hao WU Ping JIAN Yonggang HUANG Heyan HUANG

PAPER-Natural Language Processing

Pubricized:
2017/08/23
Vol:
E101-D No:1
Page(s):
225-233
Word sense disambiguation (WSD) is to identify the right sense of ambiguous words via mining their context information. Previous studies show that classifier combination is an effective approach to enhance the performance of WSD. In this paper, we systematically review state-of-the-art methods for classifier combination based WSD, including probability-based and voting-based approaches. Furthermore, a new classifier combination based WSD, namely the probability weighted voting method with dynamic self-adaptation, is proposed in this paper. Compared with existing approaches, the new method can take into consideration both the differences of classifiers and ambiguous instances. Exhaustive experiments are performed on a real-world dataset, the results show the superiority of our method over state-of-the-art methods.
Feature Ensemble Network with Occlusion Disambiguation for Accurate Patch-Based Stereo Matching
Xiaoqing YE Jiamao LI Han WANG Xiaolin ZHANG

LETTER-Image Recognition, Computer Vision

Pubricized:
2017/09/14
Vol:
E100-D No:12
Page(s):
3077-3080
Accurate stereo matching remains a challenging problem in case of weakly-textured areas, discontinuities and occlusions. In this letter, a novel stereo matching method, consisting of leveraging feature ensemble network to compute matching cost, error detection network to predict outliers and priority-based occlusion disambiguation for refinement, is presented. Experiments on the Middlebury benchmark demonstrate that the proposed method yields competitive results against the state-of-the-art algorithms.
Topic Representation of Researchers' Interests in a Large-Scale Academic Database and Its Application to Author Disambiguation
Marie KATSURAI Ikki OHMUKAI Hideaki TAKEDA

PAPER

Pubricized:
2016/01/14
Vol:
E99-D No:4
Page(s):
1010-1018
It is crucial to promote interdisciplinary research and recommend collaborators from different research fields via academic database analysis. This paper addresses a problem to characterize researchers' interests with a set of diverse research topics found in a large-scale academic database. Specifically, we first use latent Dirichlet allocation to extract topics as distributions over words from a training dataset. Then, we convert the textual features of a researcher's publications to topic vectors, and calculate the centroid of these vectors to summarize the researcher's interest as a single vector. In experiments conducted on CiNii Articles, which is the largest academic database in Japan, we show that the extracted topics reflect the diversity of the research fields in the database. The experiment results also indicate the applicability of the proposed topic representation to the author disambiguation problem.
Enriching Semantic Knowledge for WSD
Junpeng CHEN Wei YU

LETTER-Natural Language Processing

Vol:
E97-D No:8
Page(s):
2212-2216
In our previous work, we proposed to combine ConceptNet and WordNet for Word Sense Disambiguation (WSD). The ConceptNet was automatically disambiguated through Normalized Google Distance (NGD) similarity. In this letter, we present several techniques to enhance the performance of the ConceptNet disambiguation and use this enriched semantic knowledge in WSD task. We propose to enrich both the WordNet semantic knowledge and NGD to disambiguate the concepts in ConceptNet. Furthermore, we apply the enriched semantic knowledge to improve the performance of WSD. From a number of experiments, the proposed method has been obtained enhanced results.
A Method for English-Korean Target Word Selection Using Multiple Knowledge Sources
Ki-Young LEE Sang-Kyu PARK Han-Woo KIM

PAPER

Vol:
E89-A No:6
Page(s):
1622-1629
Target word selection is one of the most important and difficult tasks in English-Korean Machine Translation. It effects on the overall translation accuracy of machine translation systems. In this paper, we present a new approach to Korean target word selection for an English noun with translation ambiguities using multiple knowledge such as verb frame patterns, sense vectors based on collocations, statistical Korean local context information and co-occurring POS information. Verb frame patterns constructed with dictionary and corpus play an important role in resolving the sparseness problem of collocation data. Sense vectors are a set of collocation data when an English word having target selection ambiguities is to be translated to specific Korean target word. Statistical Korean Local Context Information is an N-gram information generated using Korean corpus. The co-occurring POS information is a statistically significant POS clue which appears with ambiguous word. To evaluate our approach, we applied the method to Tellus-EK system, English-Korean automatic translation system currently developed at ETRI [1],[2]. The experiment showed promising results for diverse sentences from web documents.
Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora
Hiroyuki KAJI Yasutsugu MORIMOTO

PAPER-Natural Language Processing

Vol:
E88-D No:2
Page(s):
289-301
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
A Probabilistic Feature-Based Parsing Model for Head-Final Languages
So-Young PARK Yong-Jae KWAK Joon-Ho LIM Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E87-D No:12
Page(s):
2893-2897
In this paper, we propose a probabilistic feature-based parsing model for head-final languages, which can lead to an improvement of syntactic disambiguation while reducing the parsing cost related to lexical information. For effective syntactic disambiguation, the proposed parsing model utilizes several useful features such as a syntactic label feature, a content feature, a functional feature, and a size feature. Moreover, it is designed to be suitable for representing word order variation of non-head words in head-final languages. Experimental results show that the proposed parsing model performs better than previous lexicalized parsing models, although it has much less dependence on lexical information.
Decision Tree Based Disambiguation of Semantic Roles for Korean Adverbial Postpositions
Seong-Bae PARK

LETTER-Natural Language Processing

Vol:
E86-D No:8
Page(s):
1459-1463
The case postpositions usually have more than one semantic role in Korean. The adverbial postpositions among various postpositions especially make the development of Korean-based machine translation system difficult, because they have more semantic roles than others. In this paper, we describe a new method for resolving semantic ambiguities of adverbial postpositions using decision tree induction. The lack of training examples in decision tree induction is overcome by clustering words into classes using a kind of greedy algorithm. The cross validation results show that the presented method achieves 76.5% of accuracy on the average, which is 20.3% improvement over the baseline method.
Disambiguating Word Senses in Korean-Japanese Machine Translation by Using Semi-Automatically Constructed Ontology
Sin-Jae KANG You-Jin CHUNG Jong-Hyeok LEE

PAPER-Natural Language Processing

Vol:
E85-D No:10
Page(s):
1688-1697
This paper presents a method for disambiguating word senses in Korean-Japanese machine translation by using a language independent ontology. This ontology stores semantic constraints between concepts and other world knowledge, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In order to acquire a language-independent and reasonably practical ontology in a limited time and with less manpower, we extend the existing Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built electronic dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from a corpus. In practical machine translation systems, our word sense disambiguation method achieved an improvement of average precision by 6.0% for Japanese analysis and by 9.2% for Korean analysis over the method without using an ontology.
A Method of Case Structure Analysis for Japanese Sentences Based on Examples in Case Frame Dictionary
Sadao KUROHASHI Makoto NAGAO

PAPER

Vol:
E77-D No:2
Page(s):
227-239
A case structure expression is one of the most important forms to represent the meaning of the sentence. Case structure analysis is usually performed by consulting case frame information in a verb dictionary. However, this analysis is very difficult because of several problems, such as word sense ambiguity and structural ambiguity. A conventional method for solving these problems is to use the method of selectional restriction, but this method has a drawback in the semantic marker (SM) method --the trade-off between descriptive power and construction cost. In this paper, we propose a method of case structure analysis based on examples in case frame dictionary This method uses the case frame dictionary which has some typical example sentences for each case frame, and it selects a proper case frame for an input sentence by matching the input sentence with the examples in the case frame dictionary. The best matching score, which is utilized for selecting a proper case frame for a predicate, can be considered as the score for the case structure of the predicate. Therefore, when there are two or more readings for a sentence because of structural ambiguity, the best reading of a sentence can be selected by evaluating the sum of the scores for the case structures of all predicates in a sentence. We report on experiments which shows that this method is superior to the conventional, coarse-grained SM method, and also describe the superiority of the example-based method over the SM method.
A Preferential Constraint Satisfaction Technique for Natural Language Analysis
Katashi NAGAO

PAPER

Vol:
E77-D No:2
Page(s):
161-170
In this paper, we present a new technique for the semantic analysis of sentences, including an ambiguity-packing method that generates a packed representation of individual syntactic and semantic structures. This representation is based on a dependency structure with constraints that must be satisfied in the syntax-semantics mapping phase. Complete syntax-semantics mapping is not performed until all ambiguities have been resolved, thus avoiding the combinatorial explosions that sometimes occur when unpacking locally packed ambiguities. A constraint satisfaction technique makes it possible to resolve ambiguities efficiently without unpacking. Disambiguation is the process of applying syntactic and semantic constraints to the possible candidate solutions (such as modifiees, cases, and wordsenses) and removing unsatisfactory condidates. Since several candidates often remain after applying constraints, another kind of knowledge to enable selection of the most plausible candidate solution is required. We call this new knowledge a preference. Both constraints and preferences must be applied to coordination for disambiguation. Either of them alone is insufficient for the purpose, and the interactions between them are important. We also present an algorithm for controlling the interaction between the constraints and the preferences in the disambiguation process. By allowing the preferences to control the application of the constraints, ambiguities can be efficiently resolved, thus avoiding combinatorial explosions.
Example-Based Word-Sense Disambiguation
Naohiko URAMOTO

PAPER

Vol:
E77-D No:2
Page(s):
240-246
This paper presents a new method for resolving lexical (word sense) ambiguities inherent in natural language sentences. The Sentence Analyzer (SENA) was developed to resolve such ambiguities by using constraints and example-based preferences. The ambiguities are packed into a single dependency structure, and grammatical and lexical constraints are applied to it in order to reduce the degree of ambiguity. The application of constraints is realized by a very effective constraint-satisfaction technique. Remaining ambiguities are resolved by the use of preferences calculated from an example-base, which is a set of fully parsed word-to-word dependencies acquired semi-automatically from on-line dictionaries.

Keyword Search Result

[Keyword] disambiguation(14hit)

Korean-Vietnamese Neural Machine Translation with Named Entity Recognition and Part-of-Speech Tags

Personal Data Retrieval and Disambiguation in Web Person Search

An Empirical Study of Classifier Combination Based Word Sense Disambiguation

Feature Ensemble Network with Occlusion Disambiguation for Accurate Patch-Based Stereo Matching

Topic Representation of Researchers' Interests in a Large-Scale Academic Database and Its Application to Author Disambiguation

Enriching Semantic Knowledge for WSD

A Method for English-Korean Target Word Selection Using Multiple Knowledge Sources

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

A Probabilistic Feature-Based Parsing Model for Head-Final Languages

Decision Tree Based Disambiguation of Semantic Roles for Korean Adverbial Postpositions

Disambiguating Word Senses in Korean-Japanese Machine Translation by Using Semi-Automatically Constructed Ontology

A Method of Case Structure Analysis for Japanese Sentences Based on Examples in Case Frame Dictionary

A Preferential Constraint Satisfaction Technique for Natural Language Analysis

Example-Based Word-Sense Disambiguation

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles