检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李英玲 兰宏富 李苒 黄闽英[2,3] LI Ying-ling;LAN Hong-fu;LI Ran;HUANG Min-ying(School of Computer Science and Engineering,Southwest Minzu University,Chengdu 610041,China;The Key Laboratory for Computer Systems of State Ethnic Affairs Commission,Southwest Minzu University,Chengdu 610041,China;Business School,Southwest Minzu University,Chengdu 610041,China)
机构地区:[1]西南民族大学计算机科学与工程学院,四川成都610041 [2]西南民族大学计算机系统国家民委重点实验室,四川成都610041 [3]西南民族大学商学院,四川成都610041
出 处:《西南民族大学学报(自然科学版)》2023年第2期189-196,共8页Journal of Southwest Minzu University(Natural Science Edition)
基 金:四川省科技厅苗子工程重点项目(2021JDRC0066);西南民族大学科研启动金资助项目(RQD2021096)。
摘 要:理解软件仓库中执行的软件维护活动,有助于确保高效的演化和开发活动.对代码提交(commit)进行准确地分类,能帮助软件管理人员更合理地进行资源分配,从而减少维护成本.然而,已有研究忽视了提交说明中关键词的上下文信息,或者未考虑变更代码的语义信息,导致不准确的提交分类.提出了基于预训练模型CodeBERT的代码提交分类模型(CBEC),该模型首先获取公开数据集中commits的code diff信息,准备提交说明和diff信息对,并进行词元化表示;接着使用CodeBERT模型学习提交说明和diff信息的语义深度表示,同时从多个维度提取提交相关的手工设计特征;最后,融合commit的语义特征和传统手工特征,构建提交分类模型.提出的模型与当前具有代表性的2个方法进行比较,从准确率、精准率和召回率来看,分别高出基线方法5.0%~26.8%、4.9%~27.2%、5.4%~27.3%.能帮助软件从业者更好地理解和识别代码提交的变更意图,有利于提高开发效益.It is beneficial to understand maintenance activities in a source code repository in order to ensure effective software e⁃volution and development activities.Accurately classifying code commits can help software managers allocate resources reasona⁃bly,which can reduce the maintenance costs.However,existing studies only used static keywords in commit messages,and neg⁃lected the context information of these keywords or semantic information of the changed code,which has lead to the incorrect commit classification.In this paper,a classification model(CBEC)of code commits was proposed based on the pre⁃trained mod⁃el CodeBERT.Firstly,this paper extracted the code diff of commits in the open dataset,and tokenized commit messages and code diff into a maximum of tokens.Then,CBEC used CodeBERT to learn the deep semantic representation of commit messages and diff messages,and extracted the hand⁃crafted features related to commits from multiple dimensions.Finally,CBEC combined the semantic feature and hand⁃crafted features,and built a commit classification based CNN network.Compared with two repre⁃sentative approaches,this CBEC was 5.0%~26.8%,4.9%~27.2%,5.4%~27.3%higher than the two baselines in terms of accuracy,preciseness,and recall respectively.Therefore,the research in this paper can help software practitioners understand and identify the change intentions of code commits effectively,which is conducive to improving the benefits of software develop⁃ment.
关 键 词:提交分类 CodeBERT 迁移学习 卷积神经网络
分 类 号:TP311.53[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117