This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.
Dong Kwan KIM
Mokpo National Maritime University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Dong Kwan KIM, "A Deep Neural Network-Based Approach to Finding Similar Code Segments" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 4, pp. 874-878, April 2020, doi: 10.1587/transinf.2019EDL8195.
Abstract: This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8195/_p
Copy
@ARTICLE{e103-d_4_874,
author={Dong Kwan KIM, },
journal={IEICE TRANSACTIONS on Information},
title={A Deep Neural Network-Based Approach to Finding Similar Code Segments},
year={2020},
volume={E103-D},
number={4},
pages={874-878},
abstract={This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.},
keywords={},
doi={10.1587/transinf.2019EDL8195},
ISSN={1745-1361},
month={April},}
Copy
TY - JOUR
TI - A Deep Neural Network-Based Approach to Finding Similar Code Segments
T2 - IEICE TRANSACTIONS on Information
SP - 874
EP - 878
AU - Dong Kwan KIM
PY - 2020
DO - 10.1587/transinf.2019EDL8195
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2020
AB - This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.
ER -