Correcting Syntactic Annotation Errors Based on Tree Mining

Kanta SUZUKI; Yoshihide KATO; Shigeki MATSUBARA

doi:10.1587/transinf.2016EDP7357

Correcting Syntactic Annotation Errors Based on Tree Mining

Kanta SUZUKI, Yoshihide KATO, Shigeki MATSUBARA

Full Text Views

0

Cite this

Summary :

This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.5 pp.1106-1113

Publication Date: 2017/05/01

Publicized: 2017/01/23

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016EDP7357

Type of Manuscript: PAPER

Category: Natural Language Processing

Authors

Kanta SUZUKI
  Nagoya University
Yoshihide KATO
  Nagoya University
Shigeki MATSUBARA
  Nagoya University

Keyword

error correction, synchronous tree substitution grammar, FREQT

Cite this

Copy

Kanta SUZUKI, Yoshihide KATO, Shigeki MATSUBARA, "Correcting Syntactic Annotation Errors Based on Tree Mining" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 5, pp. 1106-1113, May 2017, doi: 10.1587/transinf.2016EDP7357.
Abstract: This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDP7357/_p

Copy

@ARTICLE{e100-d_5_1106,
author={Kanta SUZUKI, Yoshihide KATO, Shigeki MATSUBARA, },
journal={IEICE TRANSACTIONS on Information},
title={Correcting Syntactic Annotation Errors Based on Tree Mining},
year={2017},
volume={E100-D},
number={5},
pages={1106-1113},
abstract={This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.},
keywords={},
doi={10.1587/transinf.2016EDP7357},
ISSN={1745-1361},
month={May},}

Copy

TY - JOUR
TI - Correcting Syntactic Annotation Errors Based on Tree Mining
T2 - IEICE TRANSACTIONS on Information
SP - 1106
EP - 1113
AU - Kanta SUZUKI
AU - Yoshihide KATO
AU - Shigeki MATSUBARA
PY - 2017
DO - 10.1587/transinf.2016EDP7357
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2017
AB - This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.
ER -