1-2hit |
Ryohei SASANO Daisuke KAWAHARA Sadao KUROHASHI
This paper reports the effect of corpus size on case frame acquisition for predicate-argument structure analysis in Japanese. For this study, we collect a Japanese corpus consisting of up to 100 billion words, and construct case frames from corpora of six different sizes. Then, we apply these case frames to syntactic and case structure analysis, and zero anaphora resolution, in order to investigate the relationship between the corpus size for case frame acquisition and the performance of predicate-argument structure analysis. We obtained better analyses by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 billion words.
A case structure expression is one of the most important forms to represent the meaning of the sentence. Case structure analysis is usually performed by consulting case frame information in a verb dictionary. However, this analysis is very difficult because of several problems, such as word sense ambiguity and structural ambiguity. A conventional method for solving these problems is to use the method of selectional restriction, but this method has a drawback in the semantic marker (SM) method --the trade-off between descriptive power and construction cost. In this paper, we propose a method of case structure analysis based on examples in case frame dictionary This method uses the case frame dictionary which has some typical example sentences for each case frame, and it selects a proper case frame for an input sentence by matching the input sentence with the examples in the case frame dictionary. The best matching score, which is utilized for selecting a proper case frame for a predicate, can be considered as the score for the case structure of the predicate. Therefore, when there are two or more readings for a sentence because of structural ambiguity, the best reading of a sentence can be selected by evaluating the sum of the scores for the case structures of all predicates in a sentence. We report on experiments which shows that this method is superior to the conventional, coarse-grained SM method, and also describe the superiority of the example-based method over the SM method.