This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.
Kanta SUZUKI
Nagoya University
Yoshihide KATO
Nagoya University
Shigeki MATSUBARA
Nagoya University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Kanta SUZUKI, Yoshihide KATO, Shigeki MATSUBARA, "Correcting Syntactic Annotation Errors Based on Tree Mining" in IEICE TRANSACTIONS on Information,
vol. E100-D, no. 5, pp. 1106-1113, May 2017, doi: 10.1587/transinf.2016EDP7357.
Abstract: This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDP7357/_p
Copy
@ARTICLE{e100-d_5_1106,
author={Kanta SUZUKI, Yoshihide KATO, Shigeki MATSUBARA, },
journal={IEICE TRANSACTIONS on Information},
title={Correcting Syntactic Annotation Errors Based on Tree Mining},
year={2017},
volume={E100-D},
number={5},
pages={1106-1113},
abstract={This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.},
keywords={},
doi={10.1587/transinf.2016EDP7357},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - Correcting Syntactic Annotation Errors Based on Tree Mining
T2 - IEICE TRANSACTIONS on Information
SP - 1106
EP - 1113
AU - Kanta SUZUKI
AU - Yoshihide KATO
AU - Shigeki MATSUBARA
PY - 2017
DO - 10.1587/transinf.2016EDP7357
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2017
AB - This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.
ER -