The Effect of Corpus Size on Case Frame Acquisition for Predicate-Argument Structure Analysis

Ryohei SASANO; Daisuke KAWAHARA; Sadao KUROHASHI

doi:10.1587/transinf.E93.D.1361

The Effect of Corpus Size on Case Frame Acquisition for Predicate-Argument Structure Analysis

Ryohei SASANO, Daisuke KAWAHARA, Sadao KUROHASHI

Full Text Views

0

Cite this

Summary :

This paper reports the effect of corpus size on case frame acquisition for predicate-argument structure analysis in Japanese. For this study, we collect a Japanese corpus consisting of up to 100 billion words, and construct case frames from corpora of six different sizes. Then, we apply these case frames to syntactic and case structure analysis, and zero anaphora resolution, in order to investigate the relationship between the corpus size for case frame acquisition and the performance of predicate-argument structure analysis. We obtained better analyses by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.

Publication: IEICE TRANSACTIONS on Information Vol.E93-D No.6 pp.1361-1368

Publication Date: 2010/06/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E93.D.1361

Type of Manuscript: Special Section PAPER (Special Section on Info-Plosion)

Category: Natural Language Processing

Cite this

Copy

Ryohei SASANO, Daisuke KAWAHARA, Sadao KUROHASHI, "The Effect of Corpus Size on Case Frame Acquisition for Predicate-Argument Structure Analysis" in IEICE TRANSACTIONS on Information, vol. E93-D, no. 6, pp. 1361-1368, June 2010, doi: 10.1587/transinf.E93.D.1361.
Abstract: This paper reports the effect of corpus size on case frame acquisition for predicate-argument structure analysis in Japanese. For this study, we collect a Japanese corpus consisting of up to 100 billion words, and construct case frames from corpora of six different sizes. Then, we apply these case frames to syntactic and case structure analysis, and zero anaphora resolution, in order to investigate the relationship between the corpus size for case frame acquisition and the performance of predicate-argument structure analysis. We obtained better analyses by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E93.D.1361/_p

Copy

@ARTICLE{e93-d_6_1361,
author={Ryohei SASANO, Daisuke KAWAHARA, Sadao KUROHASHI, },
journal={IEICE TRANSACTIONS on Information},
title={The Effect of Corpus Size on Case Frame Acquisition for Predicate-Argument Structure Analysis},
year={2010},
volume={E93-D},
number={6},
pages={1361-1368},
abstract={This paper reports the effect of corpus size on case frame acquisition for predicate-argument structure analysis in Japanese. For this study, we collect a Japanese corpus consisting of up to 100 billion words, and construct case frames from corpora of six different sizes. Then, we apply these case frames to syntactic and case structure analysis, and zero anaphora resolution, in order to investigate the relationship between the corpus size for case frame acquisition and the performance of predicate-argument structure analysis. We obtained better analyses by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.},
keywords={},
doi={10.1587/transinf.E93.D.1361},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - The Effect of Corpus Size on Case Frame Acquisition for Predicate-Argument Structure Analysis
T2 - IEICE TRANSACTIONS on Information
SP - 1361
EP - 1368
AU - Ryohei SASANO
AU - Daisuke KAWAHARA
AU - Sadao KUROHASHI
PY - 2010
DO - 10.1587/transinf.E93.D.1361
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E93-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2010
AB - This paper reports the effect of corpus size on case frame acquisition for predicate-argument structure analysis in Japanese. For this study, we collect a Japanese corpus consisting of up to 100 billion words, and construct case frames from corpora of six different sizes. Then, we apply these case frames to syntactic and case structure analysis, and zero anaphora resolution, in order to investigate the relationship between the corpus size for case frame acquisition and the performance of predicate-argument structure analysis. We obtained better analyses by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.
ER -