The search functionality is under construction.
The search functionality is under construction.

Detecting Partial and Near Duplication in the Blogosphere

Yeo-Chan YOON, Myung-Gil JANG, Hyun-Ki KIM, So-Young PARK

  • Full Text Views

    0

  • Cite this

Summary :

In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.

Publication
IEICE TRANSACTIONS on Information Vol.E95-D No.2 pp.681-685
Publication Date
2012/02/01
Publicized
Online ISSN
1745-1361
DOI
10.1587/transinf.E95.D.681
Type of Manuscript
LETTER
Category
Data Engineering, Web Information Systems

Authors

Keyword