基于Roberta的中文短文本语义相似度计算研究被引量：1

RESEARCH ON CALCULATION OF SEMANTIC SIMILARITY OF CHINESE SHORT TEXT BASED ON ROBERTA

作　　者：张小艳[1] 李薇 Zhang Xiaoyan;Li Wei(College of Computer Science and Technology,Xi’an University of Science and Technology,Xi’an 710600,Shaanxi,China)

机构地区：[1]西安科技大学计算机科学与技术学院,陕西西安710600

出　　处：《计算机应用与软件》2024年第8期275-281,366,共8页Computer Applications and Software

基　　金：国家自然科学基金青年科学基金项目(61702408)。

摘　　要：针对传统基于孪生网络的文本语义相似度计算模型中存在特征提取能力不足的问题,提出一种融合孪生网络与Roberta预训练模型SRoberta-SelfAtt。在孪生网络架构上,通过Roberta预训练模型分别将原始文本对编码为字级别向量,并使用自注意力机制捕获文本内部不同字之间的关联;通过池化策略获取文本对的句向量进而将表示结果交互并融合;在全连接层计算损失值,评价文本对的语义相似度。将此模型在两类任务下的三种数据集上进行实验,其结果相比于其他模型有所提升,为进一步优化文本语义相似度计算的准确率提供有效依据。Aimed at the problem of insufficient feature extraction ability in the traditional text semantic similarity calculation model based on the Siamese network,a fusion of Siamese networks and Roberta pre-training model SRoberta-SelfAtt is proposed.On the Siamese network architecture,the Roberta(a robustly optimized bert pretraining approach)pre-training model was used to encode the original text pairs into character-level vectors,and the self-attention mechanism was used to capture the associations between different words in the text.The sentence vector of the text pair was obtained through the pooling strategy,and the expression results were interacted and merged.The loss value was calculated in the fully connected layer to evaluate the semantic similarity of the text pair.This model was tested on three data sets under two types of tasks.The results show that the proposed model is improved compared with other models,and provides an effective basis for further research on optimizing the accuracy of text semantic similarity calculation.

关键词：孪生神经网络 Roberta 自注意力机制中文短文本语义相似度计算

分类号：TP391.1[自动化与计算机技术—计算机应用技术] TP183[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Roberta的中文短文本语义相似度计算研究被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于Roberta的中文短文本语义相似度计算研究 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于Roberta的中文短文本语义相似度计算研究被引量：1