The search functionality is under construction.

IEICE TRANSACTIONS on Information

Two Step POS Selection for SVM Based Text Categorization

Takeshi MASUYAMA, Hiroshi NAKAGAWA

  • Full Text Views

    0

  • Cite this

Summary :

Although many researchers have verified the superiority of Support Vector Machine (SVM) on text categorization tasks, some recent papers have reported much lower performance of SVM based text categorization methods when focusing on all types of parts of speech (POS) as input words and treating large numbers of training documents. This was caused by the overfitting problem that SVM sometimes selected unsuitable support vectors for each category in the training set. To avoid the overfitting problem, we propose a two step text categorization method with a variable cascaded feature selection (VCFS) using SVM. VCFS method selects a pair of the best number of words and the best POS combination for each category at each step of the cascade. We made use of the difference of words with the highest mutual information for each category on each POS combination. Through the experiments, we confirmed the validation of VCFS method compared with other SVM based text categorization methods, since our results showed that the macro-averaged F1 measure (64.8%) of VCFS method was significantly better than any reported F1 measures, though the micro-averaged F1 measure (85.4%) of VCFS method was similar to them.

Publication
IEICE TRANSACTIONS on Information Vol.E87-D No.2 pp.373-379
Publication Date
2004/02/01
Publicized
Online ISSN
DOI
Type of Manuscript
Special Section PAPER (Special Section on Information Processing Technology for Web Utilization)
Category

Authors

Keyword