The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] self-training(1hit)

1-1hit
  • Confidence-Driven Contrastive Learning for Document Classification without Annotated Data Open Access

    Zhewei XU  Mizuho IWAIHARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/04/19
      Vol:
    E107-D No:8
      Page(s):
    1029-1039

    Data sparsity has always been a problem in document classification, for which semi-supervised learning and few-shot learning are studied. An even more extreme scenario is to classify documents without any annotated data, but using only category names. In this paper, we introduce a nearest neighbor search-based method Con2Class to tackle this tough task. We intend to produce embeddings for predefined categories and predict category embeddings for all the unlabeled documents in a unified embedding space, such that categories can be easily assigned by searching the nearest predefined category in the embedding space. To achieve this, we propose confidence-driven contrastive learning, in which prompt-based templates are designed and MLM-maintained contrastive loss is newly proposed to finetune a pretrained language model for embedding production. To deal with the issue that no annotated data is available to validate the classification model, we introduce confidence factor to estimate the classification ability by evaluating the prediction confidence. The language model having the highest confidence factor is used to produce embeddings for similarity evaluation. Pseudo labels are then assigned by searching the semantically closest category name, which are further used to train a separate classifier following a progressive self-training strategy for final prediction. Our experiments on five representative datasets demonstrate the superiority of our proposed method over the existing approaches.