A Combined Alignment Model for Code Search

Juntong HONG; Eunjong CHOI; Osamu MIZUNO

doi:10.1587/transinf.2023MPP0002

IEICE TRANSACTIONS on Information

A Combined Alignment Model for Code Search

Juntong HONG, Eunjong CHOI, Osamu MIZUNO

Full Text Views

1

Cite this

Summary :

Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.

Publication: IEICE TRANSACTIONS on Information Vol.E107-D No.3 pp.257-267

Publication Date: 2024/03/01

Publicized: 2023/12/15

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2023MPP0002

Type of Manuscript: Special Section PAPER (Special Section on Empirical Software Engineering)

Category

Authors

Juntong HONG
  Kyoto Institute of Technology
Eunjong CHOI
  Kyoto Institute of Technology
Osamu MIZUNO
  Kyoto Institute of Technology

Keyword

code search, deep learning, code analysis

Cite this

Copy

Juntong HONG, Eunjong CHOI, Osamu MIZUNO, "A Combined Alignment Model for Code Search" in IEICE TRANSACTIONS on Information, vol. E107-D, no. 3, pp. 257-267, March 2024, doi: 10.1587/transinf.2023MPP0002.
Abstract: Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023MPP0002/_p

Copy

@ARTICLE{e107-d_3_257,
author={Juntong HONG, Eunjong CHOI, Osamu MIZUNO, },
journal={IEICE TRANSACTIONS on Information},
title={A Combined Alignment Model for Code Search},
year={2024},
volume={E107-D},
number={3},
pages={257-267},
abstract={Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.},
keywords={},
doi={10.1587/transinf.2023MPP0002},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - A Combined Alignment Model for Code Search
T2 - IEICE TRANSACTIONS on Information
SP - 257
EP - 267
AU - Juntong HONG
AU - Eunjong CHOI
AU - Osamu MIZUNO
PY - 2024
DO - 10.1587/transinf.2023MPP0002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E107-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2024
AB - Code search is a task to retrieve the most relevant code given a natural language query. Several recent studies proposed deep learning based methods use multi-encoder model to parse code into multi-field to represent code. These methods enhance the performance of the model by distinguish between similar codes and utilizing a relation matrix to bridge the code and query. However, these models require more computational resources and parameters than single-encoder models. Furthermore, utilizing the relation matrix that solely relies on max-pooling disregards the delivery of word alignment information. To alleviate these problems, we propose a combined alignment model for code search. We concatenate the multi-code fields into one sequence to represent code and use one encoding model to encode code features. Moreover, we transform the relation matrix using trainable vectors to avoid information losses. Then, we combine intra-modal and cross-modal attention to assign the salient words while matching the corresponding code and query. Finally, we apply the attention weight to code/query embedding and compute the cosine similarity. To evaluate the performance of our model, we compare our model with six previous models on two popular datasets. The results show that our model achieves 0.614 and 0.687 Top@1 performance, outperforming the best comparison models by 12.2% and 9.3%, respectively.
ER -

IEICE TRANSACTIONS on Information