The search functionality is under construction.

Author Search Result

[Author] Dandan WANG(1hit)

1-1hit
  • A Framework of Centroid-Based Methods for Text Categorization

    Dandan WANG  Qingcai CHEN  Xiaolong WANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:2
      Page(s):
    245-254

    Text Categorization (TC) is a task of classifying a set of documents into one or more predefined categories. Centroid-based method, a very popular TC method, aims to make classifiers simple and efficient by constructing one prototype vector for each class. It classifies a document into the class that owns the prototype vector nearest to the document. Many studies have been done on constructing prototype vectors. However, the basic philosophies of these methods are quite different from each other. It makes the comparison and selection of centroid-based TC methods very difficult. It also makes the further development of centroid-based TC methods more challenging. In this paper, based on the observation of its general procedure, the centroid-based text classification is treated as a kind of ranking task, and a unified framework for centroid-based TC methods is proposed. The goal of this unified framework is to classify a text via ranking all possible classes by document-class similarities. Prototype vectors are constructed based on various loss functions for ranking classes. Under this framework, three popular centroid-based methods: Rocchio, Hypothesis Margin Centroid and DragPushing are unified and their details are discussed. A novel centroid-based TC method called SLRCM that uses a smoothing ranking loss function is further proposed. Experiments conducted on several standard databases show that the proposed SLRCM method outperforms the compared centroid-based methods and reaches the same performance as the state-of-the-art TC methods.