The search functionality is under construction.

IEICE TRANSACTIONS on Information

Improving Text Categorization with Semantic Knowledge in Wikipedia

Xiang WANG, Yan JIA, Ruhua CHEN, Hua FAN, Bin ZHOU

  • Full Text Views

    0

  • Cite this

Summary :

Text categorization, especially short text categorization, is a difficult and challenging task since the text data is sparse and multidimensional. In traditional text classification methods, document texts are represented with “Bag of Words (BOW)” text representation schema, which is based on word co-occurrence and has many limitations. In this paper, we mapped document texts to Wikipedia concepts and used the Wikipedia-concept-based document representation method to take the place of traditional BOW model for text classification. In order to overcome the weakness of ignoring the semantic relationships among terms in document representation model and utilize rich semantic knowledge in Wikipedia, we constructed a semantic matrix to enrich Wikipedia-concept-based document representation. Experimental evaluation on five real datasets of long and short text shows that our approach outperforms the traditional BOW method.

Publication
IEICE TRANSACTIONS on Information Vol.E96-D No.12 pp.2786-2794
Publication Date
2013/12/01
Publicized
Online ISSN
1745-1361
DOI
10.1587/transinf.E96.D.2786
Type of Manuscript
PAPER
Category
Artificial Intelligence, Data Mining

Authors

Xiang WANG
  National University of Defense Technology
Yan JIA
  National University of Defense Technology
Ruhua CHEN
  National University of Defense Technology
Hua FAN
  National University of Defense Technology
Bin ZHOU
  National University of Defense Technology

Keyword