面向代码搜索的函数功能多重图嵌入  

Code-search-oriented Function Multigraph Embedding

在线阅读下载全文

作  者:徐杨[1] 陈晓杰 汤德佑[1] 黄翰[1] XU Yang;CHEN Xiao-Jie;TANG De-You;HUANG Han(College of Software Engineering,South China University of Technology,Guangzhou 510006,China)

机构地区:[1]华南理工大学软件学院,广东广州510006

出  处:《软件学报》2024年第8期3809-3823,共15页Journal of Software

基  金:广东省自然科学基金面上项目(2020A1515010696,2022A1515011491);国家自然科学基金面上项目(61876207,62276103);中央高校面上项目(2020ZYGXZR014);广东省财税大数据重点实验室开放基金(2019B121203012)。

摘  要:如何提高异构的自然语言查询输入和高度结构化程序语言源代码的匹配准确度,是代码搜索的一个基本问题.代码特征的准确提取是提高匹配准确度的关键之一.代码语句表达的语义不仅与其本身有关,还与其所处的上下文相关.代码的结构模型为理解代码功能提供了丰富的上下文信息.提出一个基于函数功能多重图嵌入的代码搜索方法.在所提方法中,使用早期融合的策略,将代码语句的数据依赖关系融合到控制流图中,构建函数功能多重图来表示代码.该多重图通过数据依赖关系显式表达控制流图中缺乏的非直接前驱后继节点的依赖关系,增强语句节点的上下文信息.同时,针对多重图的边的异质性,采用关系图卷积网络方法从函数多重图中提取代码的特征.在公开数据集的实验表明,相比现有基于代码文本和结构模型的方法,所提方法的MRR提高5%以上.通过消融实验也表明控制流图较数据依赖图在搜索准确度上贡献较大.How to improve the accuracy of matching between natural language query input and highly structured programming language source code is a fundamental concern in code search.Accurate extraction of code features is one of the key challenges to improving matching accuracy.The semantics expressed by statements in codes is not only relevant to themselves but also to their contexts.The structural model of the code provides rich contextual information for understanding code functions.This study proposes a code search method based on function multigraph embedding.By using an early fusion strategy,the study fuses the data dependencies of code statements into a control flow graph and constructs a function multigraph to represent the code.The multigraph explicitly expresses the dependency relationships of indirect predecessor and successor nodes that are lacking in the control flow graph through data dependencies and enhances the contextual information of statement nodes.At the same time,in view of the edge heterogeneity of the multigraph,this study uses the relational graph convolutional network to extract the features of the code from the function multigraph.Experiments on a public dataset show that the proposed method can improve the MRR by more than 5%compared with the existing methods based on code text and structural models.The ablation experiments also show that the control flow graph contributes more to the search accuracy than the data dependence graph.

关 键 词:代码搜索 控制流图 数据依赖图 函数功能多重图 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象