基于BERT-WWM预训练的跨文档三元组提取  被引量:2

CROSS DOCUMENT SPO EXTRACTION WITH BERT-WWM PRE-TRAINING

在线阅读下载全文

作  者:章振增 Zhang Zhenzeng(Linewell Software Co.,Ltd.,Quanzhou 362000,Fujian,China)

机构地区:[1]南威软件股份有限公司,福建泉州362000

出  处:《计算机应用与软件》2023年第6期181-186,215,共7页Computer Applications and Software

摘  要:关于跨文档三元组(Subject Predicate Object,SPO)抽取任务,当前的研究主要基于句子级别的分析。然而很多场景下SPO元素可能分散于文档的各个位置,句子级别的抽取技术远远无法满足需求,因此提出一种Doc2SpSPO联合SPO抽取模型。该模型通过Span候选集模型生成初始实体信息,基于BERT-WWM预训练模型得到上下文以及候选实体相关Embedding信息进行分类任务从而实现SPO的联合提取。实验结果表明,该模型实体识别可达到F1值44.4%、关系分类准确率66.9%的较好效果。The current research of cross document subject predicate object(SPO)extraction task is mainly based on sentence level analysis.However,in many scenarios,SPO elements may be scattered in various locations of the document,and the current sentence level extraction technology is far from meeting the requirements.Therefore,we propose a Doc2SpSPO joint extraction of SPO model.In this model,the initial entity information was generated by Span candidate set model.Based on the pre-training model of BERT-WWM,the context and candidate entity related embedding information for classification tasks were obtained to achieve joint extraction of SPO.The experimental results show that this model s entity recognition achieved the F1 value of 44.4%and the relationship classification accuracy of 66.9%.

关 键 词:跨文档三元组抽取 BERT Span规则 联合实体关系抽取模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象