The search functionality is under construction.

Keyword Search Result

[Keyword] code clone(7hit)

1-7hit
  • Dataset of Functionally Equivalent Java Methods and Its Application to Evaluating Clone Detection Tools Open Access

    Yoshiki HIGO  

     
    PAPER-Software System

      Pubricized:
    2024/02/21
      Vol:
    E107-D No:6
      Page(s):
    751-760

    Modern high-level programming languages have a wide variety of grammar and can implement the required functionality in different ways. The authors believe that a large amount of code that implements the same functionality in different ways exists even in open source software where the source code is publicly available, and that by collecting such code, a useful data set can be constructed for various studies in software engineering. In this study, we construct a dataset of pairs of Java methods that have the same functionality but different structures from approximately 314 million lines of source code. To construct this dataset, the authors used an automated test generation technique, EvoSuite. Test cases generated by automated test generation techniques have the property that the test cases always succeed. In constructing the dataset, using this property, test cases generated from two methods were executed against each other to automatically determine whether the behavior of the two methods is the same to some extent. Pairs of methods for which all test cases succeeded in cross-running test cases are manually investigated to be functionally equivalent. This paper also reports the results of an accuracy evaluation of code clone detection tools using the constructed dataset. The purpose of this evaluation is assessing how accurately code clone detection tools could find the functionally equivalent methods, not assessing the accuracy of detecting ordinary clones. The constructed dataset is available at github (https://github.com/YoshikiHigo/FEMPDataset).

  • A Deep Neural Network-Based Approach to Finding Similar Code Segments

    Dong Kwan KIM  

     
    LETTER-Software Engineering

      Pubricized:
    2020/01/17
      Vol:
    E103-D No:4
      Page(s):
    874-878

    This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.

  • Changes of Evaluation Values on Component Rank Model by Taking Code Clones into Consideration

    Reishi YOKOMORI  Norihiro YOSHIDA  Masami NORO  Katsuro INOUE  

     
    PAPER-Software System

      Pubricized:
    2017/10/05
      Vol:
    E101-D No:1
      Page(s):
    130-141

    There are many software systems that have been used and maintained for a long time. By undergoing such a maintenance process, similar code fragments were intentionally left in the source code of such software, and knowing how to manage a software system that contains a lot of similar code fragments becomes a major concern. In this study, we proposed a method to pick up components that were commonly used in similar code fragments from a target software system. This method was realized by using the component rank model and by checking the differences of evaluation values for each component before and after merging components that had similar code fragments. In many cases, components whose evaluation value had decreased would be used by both the components that were merged, so we considered that these components were commonly used in similar code fragments. Based on the proposed approach, we implemented a system to calculate differences of evaluation values for each component, and conducted three evaluation experiments to confirm that our method was useful for detecting components that were commonly used in similar code fragments, and to confirm how our method can help developers when developers add similar components. Based on the experimental results, we also discuss some improvement methods and provide the results from applications of these methods.

  • CLCMiner: Detecting Cross-Language Clones without Intermediates

    Xiao CHENG  Zhiming PENG  Lingxiao JIANG  Hao ZHONG  Haibo YU  Jianjun ZHAO  

     
    PAPER-Software Engineering

      Pubricized:
    2016/11/21
      Vol:
    E100-D No:2
      Page(s):
    273-284

    The proliferation of diverse kinds of programming languages and platforms makes it a common need to have the same functionality implemented in different languages for different platforms, such as Java for Android applications and C# for Windows phone applications. Although versions of code written in different languages appear syntactically quite different from each other, they are intended to implement the same software and typically contain many code snippets that implement similar functionalities, which we call cross-language clones. When the version of code in one language evolves according to changing functionality requirements and/or bug fixes, its cross-language clones may also need be changed to maintain consistent implementations for the same functionality. Thus, it is needed to have automated ways to locate and track cross-language clones within the evolving software. In the literature, approaches for detecting cross-language clones are only for languages that share a common intermediate language (such as the .NET language family) because they are built on techniques for detecting single-language clones. To extend the capability of cross-language clone detection to more diverse kinds of languages, we propose a novel automated approach, CLCMiner, without the need of an intermediate language. It mines such clones from revision histories, based on our assumption that revisions to different versions of code implemented in different languages may naturally reflect how programmers change cross-language clones in practice, and that similarities among the revisions (referred to as clones in diffs or diff clones) may indicate actual similar code. We have implemented a prototype and applied it to ten open source projects implementations in both Java and C#. The reported clones that occur in revision histories are of high precisions (89% on average) and recalls (95% on average). Compared with token-based code clone detection tools that can treat code as plain texts, our tool can detect significantly more cross-language clones. All the evaluation results demonstrate the feasibility of revision-history based techniques for detecting cross-language clones without intermediates and point to promising future work.

  • Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files

    Eunjong CHOI  Norihiro YOSHIDA  Yoshiki HIGO  Katsuro INOUE  

     
    PAPER-Software Engineering

      Pubricized:
    2014/10/28
      Vol:
    E98-D No:2
      Page(s):
    325-333

    So far, many approaches for detecting code clones have been proposed based on the different degrees of normalizations (e.g. removal of white spaces, tokenization, and regularization of identifiers). Different degrees of normalizations lead to different granularities of source code to be detect as code clones. To investigate how the normalizations impact the code clone detection, this study proposes six approaches for detecting code clones with preprocessing input source files using different degrees of normalizations. More precisely, each normalization is applied to the input source files and then equivalence class partitioning is performed to the files in the preprocessing. After that, code clones are detected from a set of files that are representatives of each equivalence class using a token-based code clone detection tool named CCFinder. The proposed approaches can be categorized into two types, approaches with non-normalization and normalization. The former is the detection of only identical files without any normalization. Meanwhile, the latter category is the detection of identical files with different degrees of normalizations such as removal of all lines containing macros. From the case study, we observed that our proposed approaches detect code clones faster than the approach that uses only CCFinder. We also found the approach with non-normalization is the fastest among the proposed approaches in many cases.

  • An Investigation into the Characteristics of Merged Code Clones during Software Evolution

    Eunjong CHOI  Norihiro YOSHIDA  Katsuro INOUE  

     
    PAPER-Software Engineering

      Vol:
    E97-D No:5
      Page(s):
    1244-1253

    Although code clones (i.e. code fragments that have similar or identical code fragments in the source code) are regarded as a factor that increases the complexity of software maintenance, tools for supporting clone refactoring (i.e. merging a set of code clones into a single method or function) are not commonly used. To promote the development of refactoring tools that can be more widely utilized, we present an investigation of clone refactoring carried out in the development of open source software systems. In the investigation, we identified the most frequently used refactoring patterns and discovered how merged code clone token sequences and differences in token sequence lengths vary for each refactoring pattern.

  • A Feature Analysis of Co-changed Code Clone by Using Clone Metrics

    Myrizki SANDHI YUDHA  Ryohei ASANO  Hirohisa AMAN  

     
    LETTER

      Vol:
    E95-A No:9
      Page(s):
    1498-1500

    Code clones are duplicated or similar code fragments, and they have been known as major entities affecting the software maintainability. Sometimes there are “co-changes” in pair of code clones: when a code fragment is changed, the clone of the fragment is also changed. Such a co-change is one of key event to discuss the successful management of code clone. This paper analyzes the trends of co-changed code clones by using the length and the content of code clones. The empirical results show that: (1) there would be a specific length of clone to be mostly co-changed (around 60-100 tokens), and (2) code clones without any “control flow keywords” have a higher possibility to be co-changed than the others.