1-6hit |
Tran Sy BANG Virach SORNLERTLAMVANICH
This paper presents a supervised method to classify a document at the sub-sentence level. Traditionally, sentiment analysis often classifies sentence polarity based on word features, syllable features, or N-gram features. A sentence, as a whole, may contain several phrases and words which carry their own specific sentiment. However, classifying a sentence based on phrases and words can sometimes be incoherent because they are ungrammatically formed. In order to overcome this problem, we need to arrange words and phrase in a dependency form to capture their semantic scope of sentiment. Thus, we transform a sentence into a dependency tree structure. A dependency tree is composed of subtrees, and each subtree allocates words and syllables in a grammatical order. Moreover, a sentence dependency tree structure can mitigate word sense ambiguity or solve the inherent polysemy of words by determining their word sense. In our experiment, we provide the details of the proposed subtree polarity classification for sub-opinion analysis. To conclude our discussion, we also elaborate on the effectiveness of the analysis result.
Zhen GUO Yujie ZHANG Chen SU Jinan XU Hitoshi ISAHARA
Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first problem, we propose to transform the traditional word-based dependency tree into character-based dependency tree by using the internal structure of words and then propose a novel character-level joint model for the three tasks. In order to resolve the second problem, we propose a novel semi-supervised joint model for exploiting n-gram feature and dependency subtree feature from partially-annotated corpus. Experimental results on the Chinese Treebank show that our joint model achieved 98.31%, 94.84% and 81.71% for Chinese word segmentation, POS tagging, and dependency parsing, respectively. Our model outperforms the pipeline model of the three tasks by 0.92%, 1.77% and 3.95%, respectively. Particularly, the F1 value of word segmentation and POS tagging achieved the best result compared with those reported until now.
Meixun JIN Yong-Hun LEE Jong-Hyeok LEE
This paper presents a new span-based dependency chart parsing algorithm that models the relations between the left and right dependents of a head. Such relations cannot be modeled in existing span-based algorithms, despite their popularity in dependency corpora. We address this problem through ternary-span combination during the subtree derivation. By modeling the relations between the left and right dependents of a head, our proposed algorithm provides a better capability of coordination disambiguation when the conjunction is annotated as the head of the left and right conjuncts. This eventually leads to state-of-the-art performance of dependency parsing on the Chinese data of the CoNLL shared task.
When a dependency parser analyzes long sentences with fewer subjects than predicates, it is difficult for it to recognize which predicate governs which subject. To handle such syntactic ambiguity between subjects and predicates, we define an "a subject clause (s-clause)" as a group of words containing several predicates and their common subject. This paper proposes a two-phase method for S-clause segmentation. The first phase reduces the number of candidates of S-clause boundaries, and the second performs S-clause segmentation using decision trees. In experimental evaluation, the S-clause information turned out to be effective for determining the governor of a subject and that of a predicate in dependency parsing. Further syntactic analysis using S-clauses achieved an improvement in precision of 5 percent.
Tomohiro OHNO Shigeki MATSUBARA Nobuo KAWAGUCHI Yasuyoshi INAGAKI
Spontaneously spoken Japanese includes a lot of grammatically ill-formed linguistic phenomena such as fillers, hesitations, inversions, and so on, which do not appear in written language. This paper proposes a novel method of robust dependency parsing using a large-scale spoken language corpus, and evaluates the availability and robustness of the method using spontaneously spoken dialogue sentences. By utilizing stochastic information about the appearance of ill-formed phenomena, the method can robustly parse spoken Japanese including fillers, inversions, or dependencies over utterance units. Experimental results reveal that the parsing accuracy reached 87.0%, and we confirmed that it is effective to utilize the location information of a bunsetsu, and the distance information between bunsetsus as stochastic information.
We propose a dependency parsing model for head-final, variable word order languages. Based on the observation that each word has its own preference for its modifying distance and the preferred distance varies according to surrounding context of the word, we define a parsing model that can reflect the preference. The experimental result shows that the parser based on our model outperforms other parsers in terms of precision and recall rate.