The search functionality is under construction.

Keyword Search Result

[Keyword] source retrieval(2hit)

1-2hit
  • A Partial Matching Convolution Neural Network for Source Retrieval of Plagiarism Detection

    Leilei KONG  Yong HAN  Haoliang QI  Zhongyuan HAN  

     
    LETTER-Natural Language Processing

      Pubricized:
    2021/03/03
      Vol:
    E104-D No:6
      Page(s):
    915-918

    Source retrieval is the primary task of plagiarism detection. It searches the documents that may be the sources of plagiarism to a suspicious document. The state-of-the-art approaches usually rely on the classical information retrieval models, such as the probability model or vector space model, to get the plagiarism sources. However, the goal of source retrieval is to obtain the source documents that contain the plagiarism parts of the suspicious document, rather than to rank the documents relevant to the whole suspicious document. To model the “partial matching” between documents, this paper proposes a Partial Matching Convolution Neural Network (PMCNN) for source retrieval. In detail, PMCNN exploits a sequential convolution neural network to extract the plagiarism patterns of contiguous text segments. The experimental results on PAN 2013 and PAN 2014 plagiarism source retrieval corpus show that PMCNN boosts the performance of source retrieval significantly, outperforming other state-of-the-art document models.

  • A Ranking Approach to Source Retrieval of Plagiarism Detection

    Leilei KONG  Zhimao LU  Zhongyuan HAN  Haoliang QI  

     
    LETTER-Data Engineering, Web Information Systems

      Pubricized:
    2016/09/29
      Vol:
    E100-D No:1
      Page(s):
    203-205

    This paper addresses the issue of source retrieval in plagiarism detection. The task of source retrieval is retrieving all plagiarized sources of a suspicious document from a source document corpus whilst minimizing retrieval costs. The classification-based methods achieved the best performance in the current researches of source retrieval. This paper points out that it is more important to cast the problem as ranking and employ learning to rank methods to perform source retrieval. Specially, it employs RankBoost and Ranking SVM to obtain the candidate plagiarism source documents. Experimental results on the dataset of PAN@CLEF 2013 Source Retrieval show that the ranking based methods significantly outperforms the baseline methods based on classification. We argue that considering the source retrieval as a ranking problem is better than a classification problem.