检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐杨[1] 陈晓杰 汤德佑[1] 黄翰[1] XU Yang;CHEN Xiao-Jie;TANG De-You;HUANG Han(College of Software Engineering,South China University of Technology,Guangzhou 510006,China)
出 处:《软件学报》2024年第8期3809-3823,共15页Journal of Software
基 金:广东省自然科学基金面上项目(2020A1515010696,2022A1515011491);国家自然科学基金面上项目(61876207,62276103);中央高校面上项目(2020ZYGXZR014);广东省财税大数据重点实验室开放基金(2019B121203012)。
摘 要:如何提高异构的自然语言查询输入和高度结构化程序语言源代码的匹配准确度,是代码搜索的一个基本问题.代码特征的准确提取是提高匹配准确度的关键之一.代码语句表达的语义不仅与其本身有关,还与其所处的上下文相关.代码的结构模型为理解代码功能提供了丰富的上下文信息.提出一个基于函数功能多重图嵌入的代码搜索方法.在所提方法中,使用早期融合的策略,将代码语句的数据依赖关系融合到控制流图中,构建函数功能多重图来表示代码.该多重图通过数据依赖关系显式表达控制流图中缺乏的非直接前驱后继节点的依赖关系,增强语句节点的上下文信息.同时,针对多重图的边的异质性,采用关系图卷积网络方法从函数多重图中提取代码的特征.在公开数据集的实验表明,相比现有基于代码文本和结构模型的方法,所提方法的MRR提高5%以上.通过消融实验也表明控制流图较数据依赖图在搜索准确度上贡献较大.How to improve the accuracy of matching between natural language query input and highly structured programming language source code is a fundamental concern in code search.Accurate extraction of code features is one of the key challenges to improving matching accuracy.The semantics expressed by statements in codes is not only relevant to themselves but also to their contexts.The structural model of the code provides rich contextual information for understanding code functions.This study proposes a code search method based on function multigraph embedding.By using an early fusion strategy,the study fuses the data dependencies of code statements into a control flow graph and constructs a function multigraph to represent the code.The multigraph explicitly expresses the dependency relationships of indirect predecessor and successor nodes that are lacking in the control flow graph through data dependencies and enhances the contextual information of statement nodes.At the same time,in view of the edge heterogeneity of the multigraph,this study uses the relational graph convolutional network to extract the features of the code from the function multigraph.Experiments on a public dataset show that the proposed method can improve the MRR by more than 5%compared with the existing methods based on code text and structural models.The ablation experiments also show that the control flow graph contributes more to the search accuracy than the data dependence graph.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49