融合自举与语义角色标注的威胁情报实体关系抽取方法  被引量:3

Threat intelligence entity relation extraction method integrating bootstrapping and semantic role labeling

在线阅读下载全文

作  者:程顺航 李志华 魏涛 CHENG Shunhang;LI Zhihua;WEI Tao(School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi Jiangsu 214122,China)

机构地区:[1]江南大学人工智能与计算机学院,江苏无锡214122

出  处:《计算机应用》2023年第5期1445-1453,共9页journal of Computer Applications

基  金:工业和信息化部智能制造项目(ZH‑XZ‑180004);中央高校基本科研业务费专项资金资助项目(JUSRP211A41,JUSRP42003)。

摘  要:为高效地自动挖掘开源异构大数据中的威胁情报实体和关系,提出一种威胁情报实体关系抽取(TIERE)方法。首先,通过分析开源网络安全报告的特点,研究并提出一种数据预处理方法;然后,针对网络安全领域文本复杂度高、标准数据样本集少的问题,提出基于改进自举法的命名实体识别(NER-IBS)算法和基于语义角色标注的关系抽取(RE-SRL)算法。利用少量样本和规则构建初始种子,通过迭代训练挖掘非结构化文本中的实体,并通过构建语义角色的策略挖掘实体之间的关系。实验结果表明,在少样本网络安全信息抽取数据集上,NER-IBS算法的F1值为84%,与RDF-CRF(Regular expression and Dictionary combined with Feature templates as well as Conditional Random Field)算法相比提高了2个百分点,且RE-SRL算法对于无类别关系抽取的F1值为94%,说明TIERE方法具有高效的实体关系抽取能力。To efficiently and automatically mine threat intelligence entities and their relations in open source heterogeneous big data,a Threat Intelligence Entity Relation Extraction(TIERE)method was proposed.Firstly,a data preprocessing method was studied and presented by analyzing the characteristics of the open source cyber security reports.Then,an Improved BootStrapping-based Named Entity Recognition(NER-IBS)algorithm and a Semantic Role Labelingbased Relation Extraction(RE-SRL)algorithm were developed for the problems of high text complexity and small standard dataset in cyber security field.Initial seeds were constructed by using a small number of samples and rules,the entities in the unstructured text were mined through iterative training,and the relations between entities were mined by the strategy of constructing semantic roles.Experimental results show that on the few-shot cyber security information extraction dataset,the F1 value of the NER-IBS algorithm is 84%,which is 2 percentage points higher than that of the RDF-CRF(Regular expression and Dictionary combined with Feature templates as well as Conditional Random Field)algorithm,and the F1 value of RE-SRL algorithm for uncategorized relation extraction is 94%,proving that TIERE method has efficient entity and relation extraction capability.

关 键 词:实体识别 关系抽取 威胁情报 自举法 语义角色标注 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象