In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yeo-Chan YOON, Myung-Gil JANG, Hyun-Ki KIM, So-Young PARK, "Detecting Partial and Near Duplication in the Blogosphere" in IEICE TRANSACTIONS on Information,
vol. E95-D, no. 2, pp. 681-685, February 2012, doi: 10.1587/transinf.E95.D.681.
Abstract: In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E95.D.681/_p
Copy
@ARTICLE{e95-d_2_681,
author={Yeo-Chan YOON, Myung-Gil JANG, Hyun-Ki KIM, So-Young PARK, },
journal={IEICE TRANSACTIONS on Information},
title={Detecting Partial and Near Duplication in the Blogosphere},
year={2012},
volume={E95-D},
number={2},
pages={681-685},
abstract={In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.},
keywords={},
doi={10.1587/transinf.E95.D.681},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - Detecting Partial and Near Duplication in the Blogosphere
T2 - IEICE TRANSACTIONS on Information
SP - 681
EP - 685
AU - Yeo-Chan YOON
AU - Myung-Gil JANG
AU - Hyun-Ki KIM
AU - So-Young PARK
PY - 2012
DO - 10.1587/transinf.E95.D.681
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2012
AB - In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.
ER -