Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

Zhen GUO; Yujie ZHANG; Chen SU; Jinan XU; Hitoshi ISAHARA

doi:10.1587/transinf.2015EDP7118

Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

Zhen GUO, Yujie ZHANG, Chen SU, Jinan XU, Hitoshi ISAHARA

Full Text Views

0

Cite this

Summary :

Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the traditional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model of the three tasks by 0.92%, 1.77% and 3.95%, respectively. Particularly, the F1 value of word segmentation and POS tagging achieved the best result compared with those reported until now.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.1 pp.257-264

Publication Date: 2016/01/01

Publicized: 2015/10/06

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2015EDP7118

Type of Manuscript: PAPER

Category: Natural Language Processing

Authors

Zhen GUO
  Beijing Jiaotong University
Yujie ZHANG
  Beijing Jiaotong University
Chen SU
  Beijing Jiaotong University
Jinan XU
  Beijing Jiaotong University
Hitoshi ISAHARA
  Toyohashi University of Technology

Keyword

joint model, Chinese word segmentation and POS tagging, dependency parsing, word internal dependency structure, semi-supervised learning

Cite this

Copy

Zhen GUO, Yujie ZHANG, Chen SU, Jinan XU, Hitoshi ISAHARA, "Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 1, pp. 257-264, January 2016, doi: 10.1587/transinf.2015EDP7118.
Abstract: Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the traditional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model of the three tasks by 0.92%, 1.77% and 3.95%, respectively. Particularly, the F1 value of word segmentation and POS tagging achieved the best result compared with those reported until now.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015EDP7118/_p

Copy

@ARTICLE{e99-d_1_257,
author={Zhen GUO, Yujie ZHANG, Chen SU, Jinan XU, Hitoshi ISAHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese},
year={2016},
volume={E99-D},
number={1},
pages={257-264},
abstract={Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the traditional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model of the three tasks by 0.92%, 1.77% and 3.95%, respectively. Particularly, the F1 value of word segmentation and POS tagging achieved the best result compared with those reported until now.},
keywords={},
doi={10.1587/transinf.2015EDP7118},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
T2 - IEICE TRANSACTIONS on Information
SP - 257
EP - 264
AU - Zhen GUO
AU - Yujie ZHANG
AU - Chen SU
AU - Jinan XU
AU - Hitoshi ISAHARA
PY - 2016
DO - 10.1587/transinf.2015EDP7118
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2016
AB - Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the traditional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model of the three tasks by 0.92%, 1.77% and 3.95%, respectively. Particularly, the F1 value of word segmentation and POS tagging achieved the best result compared with those reported until now.
ER -