A Deep Neural Network-Based Approach to Finding Similar Code Segments

Dong Kwan KIM

doi:10.1587/transinf.2019EDL8195

IEICE TRANSACTIONS on Information

A Deep Neural Network-Based Approach to Finding Similar Code Segments

Dong Kwan KIM

Full Text Views

0

Cite this

Summary :

This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.

Publication: IEICE TRANSACTIONS on Information Vol.E103-D No.4 pp.874-878

Publication Date: 2020/04/01

Publicized: 2020/01/17

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2019EDL8195

Type of Manuscript: LETTER

Category: Software Engineering

Authors

Dong Kwan KIM
Mokpo National Maritime University

Keyword

code clone detection, Siamese architecture, convolutional neural network, abstract syntax tree (AST)

Cite this

Copy

Dong Kwan KIM, "A Deep Neural Network-Based Approach to Finding Similar Code Segments" in IEICE TRANSACTIONS on Information, vol. E103-D, no. 4, pp. 874-878, April 2020, doi: 10.1587/transinf.2019EDL8195.
Abstract: This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDL8195/_p

Copy

@ARTICLE{e103-d_4_874,
author={Dong Kwan KIM, },
journal={IEICE TRANSACTIONS on Information},
title={A Deep Neural Network-Based Approach to Finding Similar Code Segments},
year={2020},
volume={E103-D},
number={4},
pages={874-878},
abstract={This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.},
keywords={},
doi={10.1587/transinf.2019EDL8195},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - A Deep Neural Network-Based Approach to Finding Similar Code Segments
T2 - IEICE TRANSACTIONS on Information
SP - 874
EP - 878
AU - Dong Kwan KIM
PY - 2020
DO - 10.1587/transinf.2019EDL8195
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2020
AB - This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.
ER -

IEICE TRANSACTIONS on Information