The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] parts of speech (POS)(1hit)

1-1hit
  • Two Step POS Selection for SVM Based Text Categorization

    Takeshi MASUYAMA  Hiroshi NAKAGAWA  

     
    PAPER

      Vol:
    E87-D No:2
      Page(s):
    373-379

    Although many researchers have verified the superiority of Support Vector Machine (SVM) on text categorization tasks, some recent papers have reported much lower performance of SVM based text categorization methods when focusing on all types of parts of speech (POS) as input words and treating large numbers of training documents. This was caused by the overfitting problem that SVM sometimes selected unsuitable support vectors for each category in the training set. To avoid the overfitting problem, we propose a two step text categorization method with a variable cascaded feature selection (VCFS) using SVM. VCFS method selects a pair of the best number of words and the best POS combination for each category at each step of the cascade. We made use of the difference of words with the highest mutual information for each category on each POS combination. Through the experiments, we confirmed the validation of VCFS method compared with other SVM based text categorization methods, since our results showed that the macro-averaged F1 measure (64.8%) of VCFS method was significantly better than any reported F1 measures, though the micro-averaged F1 measure (85.4%) of VCFS method was similar to them.