The search functionality is under construction.

Keyword Search Result

[Keyword] TF-IDF(2hit)

1-2hit
  • Boosting Spectrum-Based Fault Localization via Multi-Correct Programs in Online Programming Open Access

    Wei ZHENG  Hao HU  Tengfei CHEN  Fengyu YANG  Xin FAN  Peng XIAO  

     
    PAPER-Software Engineering

      Pubricized:
    2023/12/11
      Vol:
    E107-D No:4
      Page(s):
    525-536

    Providing students with useful feedback on faulty programs can effectively help students fix programs. Spectrum-Based Fault Location (SBFL), which is a widely studied and lightweight technique, can automatically generate a suspicious value of statement ranking to help users find potential faults in a program. However, the performance of SBFL on student programs is not satisfactory, to improve the accuracy of SBFL in student programs, we propose a novel Multi-Correct Programs based Fault Localization (MCPFL) approach. Specifically, We first collected the correct programs submitted by students on the OJ system according to the programming problem numbers and removed the highly similar correct programs based on code similarity, and then stored them together with the faulty program to be located to construct a set of programs. Afterward, we analyzed the suspiciousness of the term in the faulty program through the Term Frequency-Inverse Document Frequency (TF-IDF). Finally, we designed a formula to calculate the weight of suspiciousness for program statements based on the number of input variables in the statement and weighted it to the spectrum-based fault localization formula. To evaluate the effectiveness of MCPFL, we conducted empirical studies on six student program datasets collected in our OJ system, and the results showed that MCPFL can effectively improve the traditional SBFL methods. In particular, on the EXAM metric, our approach improves by an average of 27.51% on the Dstar formula.

  • A Method of K-Means Clustering Based on TF-IDF for Software Requirements Documents Written in Chinese Language

    Jing ZHU  Song HUANG  Yaqing SHI  Kaishun WU  Yanqiu WANG  

     
    PAPER-Software Engineering

      Pubricized:
    2021/12/28
      Vol:
    E105-D No:4
      Page(s):
    736-754

    Nowadays there is no way to automatically obtain the function points when using function point analyze (FPA) method, especially for the requirement documents written in Chinese language. Considering the characteristics of Chinese grammar in words segmentation, it is necessary to divide words accurately Chinese words, so that the subsequent entity recognition and disambiguation can be carried out in a smaller range, which lays a solid foundation for the efficient automatic extraction of the function points. Therefore, this paper proposed a method of K-Means clustering based on TF-IDF, and conducts experiments with 24 software requirement documents written in Chinese language. The results show that the best clustering effect is achieved when the extracted information is retained by 55% to 75% and the number of clusters takes the middle value of the total number of clusters. Not only for Chinese, this method and conclusion of this paper, but provides an important reference for automatic extraction of function points from software requirements documents written in other Oriental languages, and also fills the gaps of data preprocessing in the early stage of automatic calculation function points.