Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model

Canasai KRUENGKRAI; Kiyotaka UCHIMOTO; Jun'ichi KAZAMA; Yiou WANG; Kentaro TORISAWA; Hitoshi ISAHARA

doi:10.1587/transinf.E92.D.2298

Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model

Canasai KRUENGKRAI, Kiyotaka UCHIMOTO, Jun'ichi KAZAMA, Yiou WANG, Kentaro TORISAWA, Hitoshi ISAHARA

Full Text Views

0

Cite this

Summary :

In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.

Publication: IEICE TRANSACTIONS on Information Vol.E92-D No.12 pp.2298-2305

Publication Date: 2009/12/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E92.D.2298

Type of Manuscript: Special Section PAPER (Special Section on Natural Language Processing and its Applications)

Category: Morphological/Syntactic Analysis

Cite this

Copy

Canasai KRUENGKRAI, Kiyotaka UCHIMOTO, Jun'ichi KAZAMA, Yiou WANG, Kentaro TORISAWA, Hitoshi ISAHARA, "Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model" in IEICE TRANSACTIONS on Information, vol. E92-D, no. 12, pp. 2298-2305, December 2009, doi: 10.1587/transinf.E92.D.2298.
Abstract: In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.2298/_p

Copy

@ARTICLE{e92-d_12_2298,
author={Canasai KRUENGKRAI, Kiyotaka UCHIMOTO, Jun'ichi KAZAMA, Yiou WANG, Kentaro TORISAWA, Hitoshi ISAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model},
year={2009},
volume={E92-D},
number={12},
pages={2298-2305},
abstract={In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.},
keywords={},
doi={10.1587/transinf.E92.D.2298},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Joint Chinese Word Segmentation and POS Tagging Using an Error-Driven Word-Character Hybrid Model
T2 - IEICE TRANSACTIONS on Information
SP - 2298
EP - 2305
AU - Canasai KRUENGKRAI
AU - Kiyotaka UCHIMOTO
AU - Jun'ichi KAZAMA
AU - Yiou WANG
AU - Kentaro TORISAWA
AU - Hitoshi ISAHARA
PY - 2009
DO - 10.1587/transinf.E92.D.2298
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2009
AB - In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an error-driven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
ER -