1-4hit |
Semih YUMUSAK Erdogan DOGDU Halife KODAZ
Linked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs:comment and rdfs:label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.
Raul Ernesto MENENDEZ-MORA Ryutaro ICHISE
An ability to assess similarity lies close to the core of cognition. Its understanding support the comprehension of human success in tasks like problem solving, categorization, memory retrieval, inductive reasoning, etc, and this is the main reason that it is a common research topic. In this paper, we introduce the idea of semantic differences and commonalities between words to the similarity computation process. Five new semantic similarity metrics are obtained after applying this scheme to traditional WordNet-based measures. We also combine the node based similarity measures with a corpus-independent way of computing the information content. In an experimental evaluation of our approach on two standard word pairs datasets, four of the measures outperformed their classical version, while the other performed as well as their unmodified counterparts.
Most document clustering methods are a challenging issue for improving clustering performance. Document clustering based on semantic features is highly efficient. However, the method sometimes did not successfully cluster some documents, such as highly articulated documents. In order to improve the clustering success of complex documents using semantic features, this paper proposes a document clustering method that uses terms of the condensing document clusters and fuzzy association to efficiently cluster specific documents into meaningful topics based on the document set. The proposed method improves the quality of document clustering because it can extract documents from the perspective of the terms of the cluster topics using semantic features and synonyms, which can also better represent the inherent structure of the document in connection with the document cluster topics. The experimental results demonstrate that the proposed method can achieve better document clustering performance than other methods.
Tatsuya OGAWA Qiang MA Masatoshi YOSHIKAWA
In this paper, we propose a novel stakeholder mining mechanism for analyzing bias in news articles by comparing descriptions of stakeholders. Our mechanism is based on the presumption that interests often induce bias of news agencies. As we use the term, a "stakeholder" is a participant in an event described in a news article who should have some relationships with other participants in the article. Our approach attempts to elucidate bias of articles from three aspects: stakeholders, interests of stakeholders, and the descriptive polarity of each stakeholder. Mining of stakeholders and their interests is achieved by analysis of sentence structure and the use of RelationshipWordNet, a lexical resource that we developed. For analyzing polarities of stakeholder descriptions, we propose an opinion mining method based on the lexical resource SentiWordNet. As a result of analysis, we construct a relations graph of stakeholders to group stakeholders sharing mutual interests and to represent the interests of stakeholders. We also describe an application system we developed for news comparison based on the mining mechanism. This paper presents some experimental results to validate the proposed methods.