基于贝叶斯网和RoBERTa的文本派生关系挖掘方法  

Bayesian network and RoBERTa ensembles for text derivation relation mining

在线阅读下载全文

作  者:庄园 翁年凤 李杰[1,2] ZHUANG Yuan;WENG Nian-feng;LI Jie(School of Computer Science,School of Cyber Science and Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China;The Sixty-third Research Institute,National University of Defense Technology,Nanjing 210007,China;Laboratory for Big Data and Decision,National University of Defense Technology,Changsha 410073,China)

机构地区:[1]南京信息工程大学计算机学院、网络空间安全学院,江苏南京210044 [2]国防科技大学第六十三研究所,江苏南京210007 [3]国防科技大学大数据与决策实验室,湖南长沙410073

出  处:《计算机工程与设计》2024年第9期2690-2696,共7页Computer Engineering and Design

基  金:国家自然科学基金项目(61371196);国家重大科技专项基金项目(2015ZX01040201-003)。

摘  要:对不实信息进行溯源分析是抑制社交网络中不实信息传播的重要手段,传统数据溯源方法主要针对结构化数据,难以准确判断文本之间的派生关系。针对这些问题,提出一种基于贝叶斯网和RoBERTa的文本派生关系挖掘方法,通过RoBERTa模型获得文本向量;通过RoBERTa模型初步预测文本间的派生关系,得到文本是否具有派生关系的分类标签;基于向量距离、文本距离、时间跨度和文本分类标签构建贝叶斯网,对文本派生关系进行判断。实验结果表明,所提方法查准率、查全率、F 1值均高于对比方法,验证了该方法的有效性。Tracing and analyzing false information is important tools to suppress the spread of false information in social networks.Traditional traceability methods are mainly used for structured data,so it is difficult to accurately judge the derivation relation between texts.To solve the above problems,Bayesian Network and RoBERTa ensembles for text derivation relation mi-ning was proposed.The text vector was obtained by RoBERTa model.RoBERTa model was used to preliminarily predict the derivation relation between the texts and get the classification label of whether the text had derivation relation.The Bayesian network was constructed by taking distance measurement information between texts and vectors,time span information and text classification labels to judge the text derivation relation.Experimental results show that the precision,recall,F 1 value of the proposed method are higher than those of comparison methods,verifying the effectiveness of this method.

关 键 词:数据溯源 文本派生 贝叶斯网 预训练语言模型 派生关系 文本距离 概率模型 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象