Software developers may write a number of similar source code fragments including the same mistake in software products. To remove such faulty code fragments, developers inspect code clones if they found a bug in their code. While various code clone detection methods have been proposed to identify clones of either code blocks or functions, those tools do not always fit the code inspection task because a faulty code fragment may be much smaller than code blocks, e.g. a single line of code. To enable developers to search code clones of such a small faulty code fragment in a large-scale software product, we propose a method using Lempel-Ziv Jaccard Distance, which is an approximation of Normalized Compression Distance. We conducted an experiment using an existing research dataset and a user survey in a company. The result shows our method efficiently reports cloned faulty code fragments and the performance is acceptable for software developers.
Takashi ISHIO
Nara Institute of Science and Technology
Naoto MAEDA
NEC Corporation
Kensuke SHIBUYA
NEC Corporation
Kenho IWAMOTO
NEC Corporation
Katsuro INOUE
Osaka University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Takashi ISHIO, Naoto MAEDA, Kensuke SHIBUYA, Kenho IWAMOTO, Katsuro INOUE, "NCDSearch: Sliding Window-Based Code Clone Search Using Lempel-Ziv Jaccard Distance" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 5, pp. 973-981, May 2022, doi: 10.1587/transinf.2021EDP7222.
Abstract: Software developers may write a number of similar source code fragments including the same mistake in software products. To remove such faulty code fragments, developers inspect code clones if they found a bug in their code. While various code clone detection methods have been proposed to identify clones of either code blocks or functions, those tools do not always fit the code inspection task because a faulty code fragment may be much smaller than code blocks, e.g. a single line of code. To enable developers to search code clones of such a small faulty code fragment in a large-scale software product, we propose a method using Lempel-Ziv Jaccard Distance, which is an approximation of Normalized Compression Distance. We conducted an experiment using an existing research dataset and a user survey in a company. The result shows our method efficiently reports cloned faulty code fragments and the performance is acceptable for software developers.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2021EDP7222/_p
Copy
@ARTICLE{e105-d_5_973,
author={Takashi ISHIO, Naoto MAEDA, Kensuke SHIBUYA, Kenho IWAMOTO, Katsuro INOUE, },
journal={IEICE TRANSACTIONS on Information},
title={NCDSearch: Sliding Window-Based Code Clone Search Using Lempel-Ziv Jaccard Distance},
year={2022},
volume={E105-D},
number={5},
pages={973-981},
abstract={Software developers may write a number of similar source code fragments including the same mistake in software products. To remove such faulty code fragments, developers inspect code clones if they found a bug in their code. While various code clone detection methods have been proposed to identify clones of either code blocks or functions, those tools do not always fit the code inspection task because a faulty code fragment may be much smaller than code blocks, e.g. a single line of code. To enable developers to search code clones of such a small faulty code fragment in a large-scale software product, we propose a method using Lempel-Ziv Jaccard Distance, which is an approximation of Normalized Compression Distance. We conducted an experiment using an existing research dataset and a user survey in a company. The result shows our method efficiently reports cloned faulty code fragments and the performance is acceptable for software developers.},
keywords={},
doi={10.1587/transinf.2021EDP7222},
ISSN={1745-1361},
month={May},}
Copy
TY - JOUR
TI - NCDSearch: Sliding Window-Based Code Clone Search Using Lempel-Ziv Jaccard Distance
T2 - IEICE TRANSACTIONS on Information
SP - 973
EP - 981
AU - Takashi ISHIO
AU - Naoto MAEDA
AU - Kensuke SHIBUYA
AU - Kenho IWAMOTO
AU - Katsuro INOUE
PY - 2022
DO - 10.1587/transinf.2021EDP7222
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2022
AB - Software developers may write a number of similar source code fragments including the same mistake in software products. To remove such faulty code fragments, developers inspect code clones if they found a bug in their code. While various code clone detection methods have been proposed to identify clones of either code blocks or functions, those tools do not always fit the code inspection task because a faulty code fragment may be much smaller than code blocks, e.g. a single line of code. To enable developers to search code clones of such a small faulty code fragment in a large-scale software product, we propose a method using Lempel-Ziv Jaccard Distance, which is an approximation of Normalized Compression Distance. We conducted an experiment using an existing research dataset and a user survey in a company. The result shows our method efficiently reports cloned faulty code fragments and the performance is acceptable for software developers.
ER -