基于预训练模型与BiLSTM-CNN的多标签代码坏味检测方法  

Multi-label code smell detection method based on pre-trained model and BiLSTM-CNN

在线阅读下载全文

作  者:刘海洋[1] 张杨[1] 田泉泉 王晓红[1] LIU Haiyang;ZHANG Yang;TIAN Quanquan;WANG Xiaohong(School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang,Hebei 050018,China)

机构地区:[1]河北科技大学信息科学与工程学院,河北石家庄050018

出  处:《河北工业科技》2024年第5期330-335,共6页Hebei Journal of Industrial Science and Technology

基  金:国家自然科学基金(61440012);河北省自然科学基金(F2023208001);河北省引进留学人员资助项目(C20230358)。

摘  要:为了提高多标签代码坏味检测的准确率,提出一种基于预训练模型与BiLSTM-CNN的多标签代码坏味检测方法DMSmell(deep multi-smell)。首先,利用静态分析工具获取源代码中的文本信息和结构度量信息,并采用2种检测规则对代码坏味实例进行标记;其次,利用CodeBERT预训练模型生成文本信息对应的词向量,并分别采用BiLSTM和CNN对词向量和结构度量信息进行深度特征提取;最后,结合注意力机制和多层感知机,完成多标签代码坏味的检测,并对DMSmell方法进行了性能评估。结果表明:DMSmell方法在一定程度上提高了多标签代码坏味检测的准确率,与基于分类器链的方法相比,精确匹配率提高了1.36个百分点,微查全率提高了2.45个百分点,微F1提高了1.1个百分点。这表明,将文本信息与结构度量信息相结合,并利用深度学习技术进行特征提取和分类,可以有效提高代码坏味检测的准确性,为多标签代码坏味检测的研究和应用提供重要的参考。To improve the accuracy of multi-label code smell detection,a multi-label code smell detection method DMSmell(Deep Multi-Smell)based on pre-trained model and BiLSTM-CNN was proposed.Firstly,the static analysis tool was used to obtain the text information and structural metric information in the source code,and two detection rules were adopted to label the code smell instances;Secondly,the pre-training model of CodeBERT was used to generate the word vectors corresponding to the textual information,and the deep feature extraction of the word vectors and the structural metric features were performed by using BiLSTM and CNN,respectively;Finally,the detection of multi-label code smell was accomplished by combining the attention mechanism and multi-layer perceptron,and the performance of the DMSmell method was evaluated.The results show that the DMSmell method improves the accuracy of multi-label code smell detection to a certain extent.Compared with the classifier chain-based method,the accurate match ratio has improved by 1.36 percentage points,the micro-recall rate has improved by 2.45 percentage points,and the micro-F 1 has improved by 1.1 percentage points.The results show that the combination of textual information with structural metric information and the use of deep learning techniques for feature extraction and classification can effectively improve the accuracy of code smell detection,which provides an important reference for the research and application of multi-label code smell detection.

关 键 词:软件工程 代码坏味 预训练模型 多标签分类 深度学习 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象