In this paper, we propose a novel phrase-based model for Korean morphological analysis by considering a phrase as the basic processing unit, which generalizes all the other existing processing units. The impetus for using phrases this way is largely motivated by the success of phrase-based statistical machine translation (SMT), which convincingly shows that the larger the processing unit, the better the performance. Experimental results using the SEJONG dataset show that the proposed phrase-based models outperform the morpheme-based models used as baselines. In particular, when combined with the conditional random field (CRF) model, our model leads to statistically significant improvements over the state-of-the-art CRF method.
Seung-Hoon NA
Chonbuk National University
Young-Kil KIM
Electronics and Telecommunications Research Institute
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Seung-Hoon NA, Young-Kil KIM, "Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 2, pp. 512-522, February 2018, doi: 10.1587/transinf.2017EDP7085.
Abstract: In this paper, we propose a novel phrase-based model for Korean morphological analysis by considering a phrase as the basic processing unit, which generalizes all the other existing processing units. The impetus for using phrases this way is largely motivated by the success of phrase-based statistical machine translation (SMT), which convincingly shows that the larger the processing unit, the better the performance. Experimental results using the SEJONG dataset show that the proposed phrase-based models outperform the morpheme-based models used as baselines. In particular, when combined with the conditional random field (CRF) model, our model leads to statistically significant improvements over the state-of-the-art CRF method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7085/_p
Copy
@ARTICLE{e101-d_2_512,
author={Seung-Hoon NA, Young-Kil KIM, },
journal={IEICE TRANSACTIONS on Information},
title={Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging},
year={2018},
volume={E101-D},
number={2},
pages={512-522},
abstract={In this paper, we propose a novel phrase-based model for Korean morphological analysis by considering a phrase as the basic processing unit, which generalizes all the other existing processing units. The impetus for using phrases this way is largely motivated by the success of phrase-based statistical machine translation (SMT), which convincingly shows that the larger the processing unit, the better the performance. Experimental results using the SEJONG dataset show that the proposed phrase-based models outperform the morpheme-based models used as baselines. In particular, when combined with the conditional random field (CRF) model, our model leads to statistically significant improvements over the state-of-the-art CRF method.},
keywords={},
doi={10.1587/transinf.2017EDP7085},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging
T2 - IEICE TRANSACTIONS on Information
SP - 512
EP - 522
AU - Seung-Hoon NA
AU - Young-Kil KIM
PY - 2018
DO - 10.1587/transinf.2017EDP7085
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2018
AB - In this paper, we propose a novel phrase-based model for Korean morphological analysis by considering a phrase as the basic processing unit, which generalizes all the other existing processing units. The impetus for using phrases this way is largely motivated by the success of phrase-based statistical machine translation (SMT), which convincingly shows that the larger the processing unit, the better the performance. Experimental results using the SEJONG dataset show that the proposed phrase-based models outperform the morpheme-based models used as baselines. In particular, when combined with the conditional random field (CRF) model, our model leads to statistically significant improvements over the state-of-the-art CRF method.
ER -