Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files

Eunjong CHOI; Norihiro YOSHIDA; Yoshiki HIGO; Katsuro INOUE

doi:10.1587/transinf.2014EDP7292

IEICE TRANSACTIONS on Information

Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files

Eunjong CHOI, Norihiro YOSHIDA, Yoshiki HIGO, Katsuro INOUE

Full Text Views

0

Cite this

Summary :

So far, many approaches for detecting code clones have been proposed based on the different degrees of normalizations (e.g. removal of white spaces, tokenization, and regularization of identifiers). Different degrees of normalizations lead to different granularities of source code to be detect as code clones. To investigate how the normalizations impact the code clone detection, this study proposes six approaches for detecting code clones with preprocessing input source files using different degrees of normalizations. More precisely, each normalization is applied to the input source files and then equivalence class partitioning is performed to the files in the preprocessing. After that, code clones are detected from a set of files that are representatives of each equivalence class using a token-based code clone detection tool named CCFinder. The proposed approaches can be categorized into two types, approaches with non-normalization and normalization. The former is the detection of only identical files without any normalization. Meanwhile, the latter category is the detection of identical files with different degrees of normalizations such as removal of all lines containing macros. From the case study, we observed that our proposed approaches detect code clones faster than the approach that uses only CCFinder. We also found the approach with non-normalization is the fastest among the proposed approaches in many cases.

Publication: IEICE TRANSACTIONS on Information Vol.E98-D No.2 pp.325-333

Publication Date: 2015/02/01

Publicized: 2014/10/28

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2014EDP7292

Type of Manuscript: PAPER

Category: Software Engineering

Authors

Eunjong CHOI
  Osaka University
Norihiro YOSHIDA
  Nagoya University
Yoshiki HIGO
  Osaka University
Katsuro INOUE
  Osaka University

Keyword

code clone, hash function, source code transformation

Cite this

Copy

Eunjong CHOI, Norihiro YOSHIDA, Yoshiki HIGO, Katsuro INOUE, "Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files" in IEICE TRANSACTIONS on Information, vol. E98-D, no. 2, pp. 325-333, February 2015, doi: 10.1587/transinf.2014EDP7292.
Abstract: So far, many approaches for detecting code clones have been proposed based on the different degrees of normalizations (e.g. removal of white spaces, tokenization, and regularization of identifiers). Different degrees of normalizations lead to different granularities of source code to be detect as code clones. To investigate how the normalizations impact the code clone detection, this study proposes six approaches for detecting code clones with preprocessing input source files using different degrees of normalizations. More precisely, each normalization is applied to the input source files and then equivalence class partitioning is performed to the files in the preprocessing. After that, code clones are detected from a set of files that are representatives of each equivalence class using a token-based code clone detection tool named CCFinder. The proposed approaches can be categorized into two types, approaches with non-normalization and normalization. The former is the detection of only identical files without any normalization. Meanwhile, the latter category is the detection of identical files with different degrees of normalizations such as removal of all lines containing macros. From the case study, we observed that our proposed approaches detect code clones faster than the approach that uses only CCFinder. We also found the approach with non-normalization is the fastest among the proposed approaches in many cases.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2014EDP7292/_p

Copy

@ARTICLE{e98-d_2_325,
author={Eunjong CHOI, Norihiro YOSHIDA, Yoshiki HIGO, Katsuro INOUE, },
journal={IEICE TRANSACTIONS on Information},
title={Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files},
year={2015},
volume={E98-D},
number={2},
pages={325-333},
abstract={So far, many approaches for detecting code clones have been proposed based on the different degrees of normalizations (e.g. removal of white spaces, tokenization, and regularization of identifiers). Different degrees of normalizations lead to different granularities of source code to be detect as code clones. To investigate how the normalizations impact the code clone detection, this study proposes six approaches for detecting code clones with preprocessing input source files using different degrees of normalizations. More precisely, each normalization is applied to the input source files and then equivalence class partitioning is performed to the files in the preprocessing. After that, code clones are detected from a set of files that are representatives of each equivalence class using a token-based code clone detection tool named CCFinder. The proposed approaches can be categorized into two types, approaches with non-normalization and normalization. The former is the detection of only identical files without any normalization. Meanwhile, the latter category is the detection of identical files with different degrees of normalizations such as removal of all lines containing macros. From the case study, we observed that our proposed approaches detect code clones faster than the approach that uses only CCFinder. We also found the approach with non-normalization is the fastest among the proposed approaches in many cases.},
keywords={},
doi={10.1587/transinf.2014EDP7292},
ISSN={1745-1361},
month={February},}

Copy

TY - JOUR
TI - Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files
T2 - IEICE TRANSACTIONS on Information
SP - 325
EP - 333
AU - Eunjong CHOI
AU - Norihiro YOSHIDA
AU - Yoshiki HIGO
AU - Katsuro INOUE
PY - 2015
DO - 10.1587/transinf.2014EDP7292
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E98-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2015
AB - So far, many approaches for detecting code clones have been proposed based on the different degrees of normalizations (e.g. removal of white spaces, tokenization, and regularization of identifiers). Different degrees of normalizations lead to different granularities of source code to be detect as code clones. To investigate how the normalizations impact the code clone detection, this study proposes six approaches for detecting code clones with preprocessing input source files using different degrees of normalizations. More precisely, each normalization is applied to the input source files and then equivalence class partitioning is performed to the files in the preprocessing. After that, code clones are detected from a set of files that are representatives of each equivalence class using a token-based code clone detection tool named CCFinder. The proposed approaches can be categorized into two types, approaches with non-normalization and normalization. The former is the detection of only identical files without any normalization. Meanwhile, the latter category is the detection of identical files with different degrees of normalizations such as removal of all lines containing macros. From the case study, we observed that our proposed approaches detect code clones faster than the approach that uses only CCFinder. We also found the approach with non-normalization is the fastest among the proposed approaches in many cases.
ER -

IEICE TRANSACTIONS on Information