基于协同融合网络的代码搜索模型

Code search model based on collaborative fusion network

作　　者：宋其洪刘建勋[1,2] 扈海泽张祥平 SONG Qihong;LIU Jianxun;HU Haize;ZHANG Xiangping(Hunan Key Laboratory of Service Computing and New Software Service Technology(Hunan University of Science and Technology),Xiangtan Hunan 411201,China;School of Computer Science and Engineering,Hunan University of Science and Technology,Xiangtan Hunan 411201,China)

机构地区：[1]服务计算与软件服务新技术湖南省重点实验室(湖南科技大学),湖南湘潭411201 [2]湖南科技大学计算机科学与工程学院,湖南湘潭411201

出　　处：《计算机应用》2023年第12期3896-3902,共7页journal of Computer Applications

基　　金：国家自然科学基金资助项目(61872139)。

摘　　要：搜索并重用相关代码可以有效提高软件开发效率。基于深度学习的代码搜索模型通常将代码片段和查询语句嵌入同一向量空间,通过计算余弦相似度匹配并输出相应代码片段;然而大多数模型忽略了代码片段与查询语句间的协同信息。为了更全面地表征语义信息,提出一种基于协同融合的代码搜索模型BofeCS。首先,采用BERT(Bidirectional Encoder Representations from Transformers)模型提取输入序列的语义信息并将它表征为向量;其次,构建协同融合网络提取代码片段和查询语句间分词级的协同信息;最后,构建残差网络缓解表征过程中的语义信息丢失。为验证BofeCS的有效性,在多语言数据集CodeSearchNet上进行实验。实验结果表明,相较于基线模型UNIF(embedding UNIFication)、TabCS(Two-stage attention-based model for Code Search)和MRCS(Multimodal Representation for neural Code Search),BofeCS的平均倒数排名(MRR)、归一化折损累计增益(NDCG)和前k位成功命中率(SR@k)均有显著提高,其中MRR值分别提升了95.94%、52.32%和16.95%。Searching and reusing relevant code can significantly improve software development efficiency.The deep learning-based code search models usually embed code pieces and query statements into the same vector space and then match and output the relevant code by computing cosine similarity;however,most of these models ignore the collaborative information between code pieces and query statements.To fully represent semantic information,a collaborative fusion-based code search model named BofeCS was proposed.Firstly,BERT(Bidirectional Encoder Representations from Transformers)model was utilized to extract the semantic information of the input sequences and then represent it as vectors.Secondly,a collaborative fusion network was constructed to extract the token-level collaborative information between code pieces and query statements.Finally,a residual network was built to alleviate the semantic information loss during the representation process.The multi-lingual dataset CodeSearchNet was used to carry out experiments to evaluate the effectiveness of BofeCS.Experimental results show that BofeCS can significantly improve the accuracy of code search and outperform the baseline models,UNIF(embedding UNIFication),TabCS(Two-stage Attention-Based model for Code Search),and MRCS(Multimodal Representation for neural Code Search),in Mean Reciprocal Rank(MRR),Normalized Discounted Cumulative Gain(NDCG),and Top k Success hit Rate(SR@k),where the MRR values are improved by 95.94%,

关键词：软件开发代码搜索协同融合 BERT 残差网络

分类号：TP311.5[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于协同融合网络的代码搜索模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于协同融合网络的代码搜索模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索