This paper addresses the issue of source retrieval in plagiarism detection. The task of source retrieval is retrieving all plagiarized sources of a suspicious document from a source document corpus whilst minimizing retrieval costs. The classification-based methods achieved the best performance in the current researches of source retrieval. This paper points out that it is more important to cast the problem as ranking and employ learning to rank methods to perform source retrieval. Specially, it employs RankBoost and Ranking SVM to obtain the candidate plagiarism source documents. Experimental results on the dataset of PAN@CLEF 2013 Source Retrieval show that the ranking based methods significantly outperforms the baseline methods based on classification. We argue that considering the source retrieval as a ranking problem is better than a classification problem.
Leilei KONG
Harbin Engineering University,Heilongjiang Institute of Technology
Zhimao LU
Harbin Engineering University
Zhongyuan HAN
Heilongjiang Institute of Technology,Harbin Institute of Technology
Haoliang QI
Heilongjiang Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Leilei KONG, Zhimao LU, Zhongyuan HAN, Haoliang QI, "A Ranking Approach to Source Retrieval of Plagiarism Detection" in IEICE TRANSACTIONS on Information,
vol. E100-D, no. 1, pp. 203-205, January 2017, doi: 10.1587/transinf.2016EDL8090.
Abstract: This paper addresses the issue of source retrieval in plagiarism detection. The task of source retrieval is retrieving all plagiarized sources of a suspicious document from a source document corpus whilst minimizing retrieval costs. The classification-based methods achieved the best performance in the current researches of source retrieval. This paper points out that it is more important to cast the problem as ranking and employ learning to rank methods to perform source retrieval. Specially, it employs RankBoost and Ranking SVM to obtain the candidate plagiarism source documents. Experimental results on the dataset of PAN@CLEF 2013 Source Retrieval show that the ranking based methods significantly outperforms the baseline methods based on classification. We argue that considering the source retrieval as a ranking problem is better than a classification problem.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDL8090/_p
Copy
@ARTICLE{e100-d_1_203,
author={Leilei KONG, Zhimao LU, Zhongyuan HAN, Haoliang QI, },
journal={IEICE TRANSACTIONS on Information},
title={A Ranking Approach to Source Retrieval of Plagiarism Detection},
year={2017},
volume={E100-D},
number={1},
pages={203-205},
abstract={This paper addresses the issue of source retrieval in plagiarism detection. The task of source retrieval is retrieving all plagiarized sources of a suspicious document from a source document corpus whilst minimizing retrieval costs. The classification-based methods achieved the best performance in the current researches of source retrieval. This paper points out that it is more important to cast the problem as ranking and employ learning to rank methods to perform source retrieval. Specially, it employs RankBoost and Ranking SVM to obtain the candidate plagiarism source documents. Experimental results on the dataset of PAN@CLEF 2013 Source Retrieval show that the ranking based methods significantly outperforms the baseline methods based on classification. We argue that considering the source retrieval as a ranking problem is better than a classification problem.},
keywords={},
doi={10.1587/transinf.2016EDL8090},
ISSN={1745-1361},
month={January},}
Copy
TY - JOUR
TI - A Ranking Approach to Source Retrieval of Plagiarism Detection
T2 - IEICE TRANSACTIONS on Information
SP - 203
EP - 205
AU - Leilei KONG
AU - Zhimao LU
AU - Zhongyuan HAN
AU - Haoliang QI
PY - 2017
DO - 10.1587/transinf.2016EDL8090
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2017
AB - This paper addresses the issue of source retrieval in plagiarism detection. The task of source retrieval is retrieving all plagiarized sources of a suspicious document from a source document corpus whilst minimizing retrieval costs. The classification-based methods achieved the best performance in the current researches of source retrieval. This paper points out that it is more important to cast the problem as ranking and employ learning to rank methods to perform source retrieval. Specially, it employs RankBoost and Ranking SVM to obtain the candidate plagiarism source documents. Experimental results on the dataset of PAN@CLEF 2013 Source Retrieval show that the ranking based methods significantly outperforms the baseline methods based on classification. We argue that considering the source retrieval as a ranking problem is better than a classification problem.
ER -