IEICE global.ieice.org Site

Author Search Result

[Author] Hae-Chang RIM(16hit)

1-16hit

Minimizing Human Intervention for Constructing Korean Part-of-Speech Tagged Corpus
Do-Gil LEE Gumwon HONG Seok Kee LEE Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E93-D No:8
Page(s):
2336-2338
The construction of annotated corpora requires considerable manual effort. This paper presents a pragmatic method to minimize human intervention for the construction of Korean part-of-speech (POS) tagged corpus. Instead of focusing on improving the performance of conventional automatic POS taggers, we devise a discriminative POS tagger which can selectively produce either a single analysis or multiple analyses based on the tagging reliability. The proposed approach uses two decision rules to judge the tagging reliability. Experimental results show that the proposed approach can effectively control the quality of corpus and the amount of manual annotation by the threshold value of the rule.
Naïve Probabilistic Shift-Reduce Parsing Model Using Functional Word Based Context for Agglutinative Languages
Yong-Jae KWAK So-Young PARK Joon-Ho LIM Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E87-D No:9
Page(s):
2286-2289
In this paper, we propose a naïve probabilistic shift-reduce parsing model which can use contextual information more flexibly than the previous probabilistic GLR parsing models, and utilize the characteristics of agglutinative language in which the functional words are highly developed. Experimental results on Korean have shown that our model using the proposed contextual information improves the parsing accuracy more effectively than the previous models. Moreover, it is compact in model size, and is robust with a small training set.
A Definitional Question Answering System Based on Phrase Extraction Using Syntactic Patterns
Kyoung-Soo HAN Young-In SONG Sang-Bum KIM Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E89-D No:4
Page(s):
1601-1605
We propose a definitional question answering system that extracts phrases using syntactic patterns which are easily constructed manually and can reduce the coverage problem. Experimental results show that our phrase extraction system outperforms a sentence extraction system, especially for selecting concise answers, in terms of recall and precision, and indicate that the proper text unit of answer candidates and the final answer has a significant effect on the system performance.
Three-Phase Text Error Correction Model for Korean SMS Messages
Jeunghyun BYUN So-Young PARK Seung-Wook LEE Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E92-D No:5
Page(s):
1213-1217
In this paper, we propose a three-phase text error correction model consisting of a word spacing error correction phase, a syllable-based spelling error correction phase, and a word-based spelling error correction phase. In order to reduce the text error correction complexity, the proposed model corrects text errors step by step. With the aim of correcting word spacing errors, spelling errors, and mixed errors in SMS messages, the proposed model tries to separately manage the word spacing error correction phase and the spelling error correction phase. For the purpose of utilizing both the syllable-based approach covering various errors and the word-based approach correcting some specific errors accurately, the proposed model subdivides the spelling error correction phase into the syllable-based phase and the word-based phase. Experimental results show that the proposed model can improve the performance by solving the text error correction problem based on the divide-and-conquer strategy.
Utilizing the Web for Automatic Word Spacing
Gumwon HONG Jeong-Hoon LEE Young-In SONG Do-Gil LEE Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E92-D No:12
Page(s):
2553-2556
This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.
Topic Document Model Approach for Naive Bayes Text Classification
Sang-Bum KIM Hae-Chang RIM Jin-Dong KIM

LETTER-Natural Language Processing

Vol:
E88-D No:5
Page(s):
1091-1094
The multinomial naive Bayes model has been widely used for probabilistic text classification. However, the parameter estimation for this model sometimes generates inappropriate probabilities. In this paper, we propose a topic document model for the multinomial naive Bayes text classification, where the parameters are estimated from normalized term frequencies of each training document. Experiments are conducted on Reuters 21578 and 20 Newsgroup collections, and our proposed approach obtained a significant improvement in performance compared to the traditional multinomial naive Bayes.
Semantic Classification of Bio-Entities Incorporating Predicate-Argument Features
Kyung-Mi PARK Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E91-D No:4
Page(s):
1211-1214
In this paper, we propose new external context features for the semantic classification of bio-entities. In the previous approaches, the words located on the left or the right context of bio-entities are frequently used as the external context features. However, in our prior experiments, the external contexts in a flat representation did not improve the performance. In this study, we incorporate predicate-argument features into training the ME-based classifier. Through parsing and argument identification, we recognize biomedical verbs that have argument relations with the constituents including a bio-entity, and then use the predicate-argument structures as the external context features. The extraction of predicate-argument features can be done by performing two identification tasks: the biomedically salient word identification which determines whether a word is a biomedically salient word or not, and the target verb identification which identifies biomedical verbs that have argument relations with the constituents including a bio-entity. Experiments show that the performance of semantic classification in the bio domain can be improved by utilizing such predicate-argument features.
Incorporating Frame Information to Semantic Role Labeling
Joo-Young LEE Young-In SONG Hae-Chang RIM Kyoung-Soo HAN

LETTER-Natural Language Processing

Vol:
E93-D No:1
Page(s):
201-204
In this paper, we suggest a new probabilistic model of semantic role labeling, which uses the frameset of the predicate as explicit linguistic knowledge for providing global information on the predicate-argument structure that local classifier is unable to catch. The proposed model consists of three sub-models: role sequence generation model, frameset generation model, and matching model. The role sequence generation model generates the semantic role sequence candidates of a given predicate by using the local classification approach, which is a widely used approach in previous research. The frameset generation model estimates the probability of each frameset that the predicate can take. The matching model is designed to measure the degree of the matching between the generated role sequence and the frameset by using several features. These features are developed to represent the predicate-argument structure information described in the frameset. In the experiments, our model shows that the use of knowledge about the predicate-argument structure is effective for selecting a more appropriate semantic role sequence.
Utilizing Global Syntactic Tree Features for Phrase Reordering
Yeon-Soo LEE Hyoung-Gyu LEE Hae-Chang RIM Young-Sook HWANG

LETTER-Natural Language Processing

Vol:
E97-D No:6
Page(s):
1694-1698
In phrase-based statistical machine translation, long distance reordering problem is one of the most challenging issues when translating syntactically distant language pairs. In this paper, we propose a novel reordering model to solve this problem. In our model, reordering is affected by the overall structures of sentences such as listings, reduplications, and modifications as well as the relationships of adjacent phrases. To this end, we reflect global syntactic contexts including the parts that are not yet translated during the decoding process.
Automatic Acronym Dictionary Construction Based on Acronym Generation Types
Yeo-Chan YOON So-Young PARK Young-In SONG Hae-Chang RIM Dae-Woong RHEE

LETTER-Natural Language Processing

Vol:
E91-D No:5
Page(s):
1584-1587
In this paper, we propose a new model of automatically constructing an acronym dictionary. The proposed model generates possible acronym candidates from a definition, and then verifies each acronym-definition pair with a Naive Bayes classifier based on web documents. In order to achieve high dictionary quality, the proposed model utilizes the characteristics of acronym generation types: a syllable-based generation type, a word-based generation type, and a mixed generation type. Compared with a previous model recognizing an acronym-definition pair in a document, the proposed model verifying a pair in web documents improves approximately 50% recall on obtaining acronym-definition pairs from 314 Korean definitions. Also, the proposed model improves 7.25% F-measure on verifying acronym-definition candidate pairs by utilizing specialized classifiers with the characteristics of acronym generation types.
A New Probabilistic Dependency Parsing Model for Head-Final, Free Word Order Languages
Hoojung CHUNG Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E86-D No:11
Page(s):
2490-2493
We propose a dependency parsing model for head-final, variable word order languages. Based on the observation that each word has its own preference for its modifying distance and the preferred distance varies according to surrounding context of the word, we define a parsing model that can reflect the preference. The experimental result shows that the parser based on our model outperforms other parsers in terms of precision and recall rate.
Computing Word Semantic Relatedness for Question Retrieval in Community Question Answering
Jung-Tae LEE Young-In SONG Hae-Chang RIM

LETTER-Contents Technology and Web Information Systems

Vol:
E92-D No:4
Page(s):
736-739
Previous approaches to question retrieval in community-based question answering rely on statistical translation techniques to match users' questions (queries) against collections of previously asked questions. This paper presents a simple but effective method for computing word relatedness to improve question retrieval based on word co-occurrence information directly extracted from question and answer archives. Experimental results show that the proposed approach significantly outperforms translation-based approaches.
Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval
Young-In SONG Kyoung-Soo HAN So-Young PARK Sang-Bum KIM Hae-Chang RIM

LETTER-Contents Technology and Web Information Systems

Vol:
E90-D No:11
Page(s):
1873-1876
In this paper, we propose two weighting techniques to improve performances of query expansion in biomedical document retrieval, especially when a short biomedical term in a query is expanded with its synonymous multi-word terms. When a query contains synonymous terms of different lengths, a traditional IR model highly ranks a document containing a longer terminology because a longer terminology has more chance to be matched with a query. However, such preference is clearly inappropriate and it often yields an unsatisfactory result. To alleviate the bias weighting problem, we devise a method of normalizing the weights of query terms in a long multi-word biomedical term, and a method of discriminating terms by using inverse terminology frequency which is a novel statistics estimated in a query domain. The experiment results on MEDLINE corpus show that our two simple techniques improve the retrieval performance by adjusting the inadequate preference for long multi-word terminologies in an expanded query.
Estimating Translation Probabilities Considering Semantic Recoverability of Phrase Retranslation
Hyoung-Gyu LEE Min-Jeong KIM YingXiu QUAN Hae-Chang RIM So-Young PARK

LETTER-Natural Language Processing

Vol:
E95-D No:3
Page(s):
897-901
The general method for estimating phrase translation probabilities consists of sequential processes: word alignment, phrase pair extraction, and phrase translation probability calculation. However, during this sequential process, errors may propagate from the word alignment step through the translation probability calculation step. In this paper, we propose a new method for estimating phrase translation probabilities that reduce the effects of error propagation. By considering the semantic recoverability of phrase retranslation, our method identifies incorrect phrase pairs that have propagated from alignment errors. Furthermore, we define retranslation similarity which represents the semantic recoverability of phrase retranslation, and use this when computing translation probabilities. Experimental results show that the proposed phrase translation estimation method effectively prevents a PBSMT system from selecting incorrect phrase pairs, and consistently improves the translation quality in various language pairs.
A Probabilistic Feature-Based Parsing Model for Head-Final Languages
So-Young PARK Yong-Jae KWAK Joon-Ho LIM Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E87-D No:12
Page(s):
2893-2897
In this paper, we propose a probabilistic feature-based parsing model for head-final languages, which can lead to an improvement of syntactic disambiguation while reducing the parsing cost related to lexical information. For effective syntactic disambiguation, the proposed parsing model utilizes several useful features such as a syntactic label feature, a content feature, a functional feature, and a size feature. Moreover, it is designed to be suitable for representing word order variation of non-head words in head-final languages. Experimental results show that the proposed parsing model performs better than previous lexicalized parsing models, although it has much less dependence on lexical information.
Improving Parsing Performance Using Corpus-Based Temporal Expression Analysis
Juntae YOON Seonho KIM Hae-Chang RIM

LETTER-Natural Language Processing

Vol:
E87-D No:12
Page(s):
2898-2902
This paper presents a method for improving the performance of syntactic analysis by using accurate temporal expression processing. Temporal expression causes parsing errors due to its syntactic duality, but its resolution is not trivial since the syntactic role of temporal expression is understandable in the context. In our work, syntactic functions of temporal words are decisively identified based on local contexts of individual temporal words acquired from a large corpus, which are represented by a finite state method. Experimental results show how the proposed method, incorporated with parsing, improves the accuracy and efficiency of the syntactic analysis.

Author Search Result

[Author] Hae-Chang RIM(16hit)

Minimizing Human Intervention for Constructing Korean Part-of-Speech Tagged Corpus

Naïve Probabilistic Shift-Reduce Parsing Model Using Functional Word Based Context for Agglutinative Languages

A Definitional Question Answering System Based on Phrase Extraction Using Syntactic Patterns

Three-Phase Text Error Correction Model for Korean SMS Messages

Utilizing the Web for Automatic Word Spacing

Topic Document Model Approach for Naive Bayes Text Classification

Semantic Classification of Bio-Entities Incorporating Predicate-Argument Features

Incorporating Frame Information to Semantic Role Labeling

Utilizing Global Syntactic Tree Features for Phrase Reordering

Automatic Acronym Dictionary Construction Based on Acronym Generation Types

A New Probabilistic Dependency Parsing Model for Head-Final, Free Word Order Languages

Computing Word Semantic Relatedness for Question Retrieval in Community Question Answering

Simple Weighting Techniques for Query Expansion in Biomedical Document Retrieval

Estimating Translation Probabilities Considering Semantic Recoverability of Phrase Retranslation

A Probabilistic Feature-Based Parsing Model for Head-Final Languages

Improving Parsing Performance Using Corpus-Based Temporal Expression Analysis

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles