基于CodeBERT的设计模式语言模型  

CodeBERT-based Language Model for Design Patterns

在线阅读下载全文

作  者:陈时非 刘东 江贺[1] CHEN Shifei;LIU Dong;JIANG He(School of Software Technology,Dalian University of Technology,Dalian,Liaoning 116620,China)

机构地区:[1]大连理工大学软件学院,辽宁大连116620

出  处:《计算机科学》2023年第12期75-81,共7页Computer Science

基  金:国家自然科学基金(61722202)。

摘  要:设计模式是对实际软件设计方案的经验性总结,是软件开发中辅助软件设计的有效方案之一。现有设计模式挖掘研究的任务大多是在源代码中识别设计模式的实例,少有考虑用自然语言语料对设计模式建模。为了提升设计模式语言分类模型的推荐效果,将代码、类图或对象协作纳入考虑范围,提出了一种基于CodeBERT的设计模式分类挖掘模型dpCodeBERT,以实现自然语言与代码语言的对照理解。首先,通过随机组合合成多分类算法数据和代码搜索数据作为模型输入,dpCodeBERT模型能够获取transformer层中的模型为令牌生成的注意力权重;然后,分析令牌和语句注意力权重以发现更有效的模型输入类别,进一步改造训练输入;最后,dpCodeBERT模型能够通过全连接层将分布式特征映射到样本空间并输出复数值的方式实现具体软件工程任务,如设计模式选择和设计模式代码搜索任务。在拥有80个软件设计问题的设计模式选择任务的数据集上的实验结果显示,相比同类基准模型,所提模型在设计模式检测准确率(RCDDP)和平均倒数排名(MRR)两个指标上平均提升了10%~20%,设计模式选择更加准确。通过深度研究模型数据需求,dpCodeBERT挖掘了CodeBERT对类级代码的理解,探索了CodeBERT在设计模式挖掘中的应用,具有预测准确、拓展性强等特点。As summarizations of the experiences of practical software design,design patterns are regarded as an effective means for software design assistance.Most of the current researches on design patterns mining aim at recognition of design pattern instance in source codes,modelling design patterns with natural language corpus is largely unexplored.In order to enhance the performance of language model for recommending design patterns with codes,class diagram or object collaboration,a design pattern classification mining model based on CodeBERT,named dpCodeBERT,is proposed,achieving the contrast understanding of design patterns in natural language and programming language.Firstly,multi-classification dataset and code search dataset are ge-nerated using random combination and used as inputs of the model.Using dpCodeBERT to get attention weights of each layer of transformer of each token and statement from the inputs.Secondly,the input dataset is further improved by analyzing attention weights and discovering the most important category of inputs.Finally,dpCodeBERT is applied to specific software engineering downstream tasks such as design patterns selection and design patterns code search.The purposes of tasks are accomplished by mapping distributed features to sample space trough fully connected layers and outputting multi values.The result of the experiment on 80 software design problems in design pattern selection task shows that ratio of correct detection of design pattern(RCDDP)and mean reciprocal rank(MRR)of dpCodeBERT are improved by the average of 10%~20%compared with baseline mo-dels,and the design pattern selection is more accurate.Through in-depth study of the data demand of the model,dpCodeBERT improves the understanding of class code of CodeBERT and discovers the application of CodeBERT in design patterns mining.It has the characteristics of accurate prediction and great scalability.

关 键 词:设计模式挖掘 自然语言处理 预训练语言模型 CodeBERT 模型精调 向量化 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象