检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:薛诗琦 王阿川[1] XUE Shi-qi;WANG A-chuan(School of Information and Computer Engineering,Northeast Forestry University,Harbin 150040,China)
机构地区:[1]东北林业大学信息与计算机工程学院,哈尔滨150040
出 处:《小型微型计算机系统》2023年第6期1338-1344,共7页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61975028)资助.
摘 要:针对软件缺陷报告严重性预测中现有模型分类精度较低、深层次的语义特征不够丰富等问题,本文提出了一种基于BERT句子级别与词级别特征融合的SWF-BERT(Sentence-level and Word-level features Fusion-BERT)软件缺陷报告严重性预测模型.首先,对缺陷报告中的文本进行了数据预处理.其次,为了加强嵌入层中融合后的特征语义信息,提取词频最高的前100个单词,筛选出与缺陷严重性相关的特征词对其进行关键词嵌入操作,并融合嵌入层中的其他向量进行词嵌入.最后,将BERT模型输出层得到的特征(除[CLS]token外)送入多尺度卷积神经网络结合长短期记忆网络(MC-LSTM)模型中,加强了不同特征间远距离的时序信息.采用BERT模型输出得到的[CLS]句向量经过线性变换的结果与MC-LSTM模型输出经过线性变换得到的结果做可学习的自适应加权融合,实现了对软件缺陷报告严重性的有效预测.实验结果表明,使用SWF-BERT模型的平均准确率、召回率和F1值在Mozilla数据集中分别达到了68.41%、64.60%和64.86%,在Eclipse数据集中分别达到了61.32%、62.62%和59.31%,与其他分类算法相比,该方法在性能上得到了较大的提升.To address the problems of low classification accuracy and deep semantic features are not rich enough in software defect report severity prediction.In this paper,we propose a software defect report severity prediction model based on the fusion of BERT sentence-level and word-level features.Firstly,the text in the defect report was pre-processed for data.Secondly,the top 100 words with the highest frequency are extracted,to enhance the integrated feature semantic information in the embedding layer.Filter the feature words related to defect severity for keyword embedding operation and integrate other vectors in the embedding layer for word embedding.Finally,the word embeddings(except[CLS]token)obtained by passing the BERT model through the output layer,fed into a multiscale convolutional neural network combined with a long and short-term memory network model.It is used to enhance the long distance timing information between different features.The linear transformation result of the[CLS]sentence vector output from the last layer and the linear transformation result of the output of MC-LSTM model are used to do a learnable adaptive weighted fusion.It can effectively predict the severity of software defect report.The experimental results show that,the average accuracy,recall and F1 values using the SWF-BERT model reaches 68.41%,64.60%and 64.86%in the Mozilla dataset,61.32%,62.62%and 59.31%in the Eclipse dataset,respectively.Compared with other classification algorithms,the performance of this method has been greatly improved.
关 键 词:软件缺陷报告 严重性预测 关键词嵌入 多尺度卷积神经网络 特征融合
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7