Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging

Seung-Hoon NA; Young-Kil KIM

doi:10.1587/transinf.2017EDP7085

Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging

Seung-Hoon NA, Young-Kil KIM

Full Text Views

0

Cite this

Summary :

In this paper, we propose a novel phrase-based model for Korean morphological analysis by considering a phrase as the basic processing unit, which generalizes all the other existing processing units. The impetus for using phrases this way is largely motivated by the success of phrase-based statistical machine translation (SMT), which convincingly shows that the larger the processing unit, the better the performance. Experimental results using the SEJONG dataset show that the proposed phrase-based models outperform the morpheme-based models used as baselines. In particular, when combined with the conditional random field (CRF) model, our model leads to statistically significant improvements over the state-of-the-art CRF method.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.2 pp.512-522

Publication Date: 2018/02/01

Publicized: 2017/11/13

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDP7085

Type of Manuscript: PAPER

Category: Natural Language Processing

Authors

Seung-Hoon NA
Chonbuk National University
Young-Kil KIM
Electronics and Telecommunications Research Institute

Keyword

phrase-based model, segmentation, tagging, morphological analysis, Korean morphological analysis

Cite this

Copy

Seung-Hoon NA, Young-Kil KIM, "Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 2, pp. 512-522, February 2018, doi: 10.1587/transinf.2017EDP7085.
Abstract: In this paper, we propose a novel phrase-based model for Korean morphological analysis by considering a phrase as the basic processing unit, which generalizes all the other existing processing units. The impetus for using phrases this way is largely motivated by the success of phrase-based statistical machine translation (SMT), which convincingly shows that the larger the processing unit, the better the performance. Experimental results using the SEJONG dataset show that the proposed phrase-based models outperform the morpheme-based models used as baselines. In particular, when combined with the conditional random field (CRF) model, our model leads to statistically significant improvements over the state-of-the-art CRF method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7085/_p

Copy

@ARTICLE{e101-d_2_512,
author={Seung-Hoon NA, Young-Kil KIM, },
journal={IEICE TRANSACTIONS on Information},
title={Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging},
year={2018},
volume={E101-D},
number={2},
pages={512-522},
abstract={In this paper, we propose a novel phrase-based model for Korean morphological analysis by considering a phrase as the basic processing unit, which generalizes all the other existing processing units. The impetus for using phrases this way is largely motivated by the success of phrase-based statistical machine translation (SMT), which convincingly shows that the larger the processing unit, the better the performance. Experimental results using the SEJONG dataset show that the proposed phrase-based models outperform the morpheme-based models used as baselines. In particular, when combined with the conditional random field (CRF) model, our model leads to statistically significant improvements over the state-of-the-art CRF method.},
keywords={},
doi={10.1587/transinf.2017EDP7085},
ISSN={1745-1361},
month={February},}

Copy

TY - JOUR
TI - Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging
T2 - IEICE TRANSACTIONS on Information
SP - 512
EP - 522
AU - Seung-Hoon NA
AU - Young-Kil KIM
PY - 2018
DO - 10.1587/transinf.2017EDP7085
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2018
AB - In this paper, we propose a novel phrase-based model for Korean morphological analysis by considering a phrase as the basic processing unit, which generalizes all the other existing processing units. The impetus for using phrases this way is largely motivated by the success of phrase-based statistical machine translation (SMT), which convincingly shows that the larger the processing unit, the better the performance. Experimental results using the SEJONG dataset show that the proposed phrase-based models outperform the morpheme-based models used as baselines. In particular, when combined with the conditional random field (CRF) model, our model leads to statistically significant improvements over the state-of-the-art CRF method.
ER -