基于多层级注意力机制和动态阈值的远程监督关系抽取  

Distant supervision relation extraction based on multi-level attentionmechanism and dynamic threshold

在线阅读下载全文

作  者:赵红燕[1] 张莹刚 谢斌红[1] Zhao Hongyan;Zhang Yinggang;Xie Binhong(School of Computer Science&Technology,Taiyuan University of Science&Technology,Taiyuan 030024,China)

机构地区:[1]太原科技大学计算机科学与技术学院,太原030024

出  处:《计算机应用研究》2024年第11期3288-3294,共7页Application Research of Computers

基  金:山西省基础研究计划资助项目(202203021211199);智能信息处理山西省重点实验室开放基金资助项目(CICIP2022004);太原科技大学博士科研启动基金资助项目(20212075)。

摘  要:远程监督关系抽取面临着数据质量的问题,即生成的数据集存在多类噪声,包括噪声词、噪声句和噪声包。现有研究主要集中在噪声句方面,忽略了其他噪声的影响,无法彻底消除噪声。为此,提出一种基于多层级注意力机制和动态阈值的远程监督关系抽取模型(MADT)。该模型首先采用预训练语言模型获取实体对语义表示,再通过双向门控循环单元和自注意力机制获得蕴涵关键词信息的语义特征,然后结合句子深层上下文表示依次处理三种噪声问题。此外,还提出一种动态阈值方法进一步剔除噪声句,增强正例句对包表示的贡献,并采用基于语义相似性的注意力机制降低噪声包的影响。在NYT10d和NYT10m数据集上的实验表明,MADT模型能够解决远程监督关系抽取中各个层面的噪声,有效提升关系的抽取性能。Distant supervision relation extraction faces the problem of data quality,that is,the generated dataset has multiple types of noise,noisy words,noisy sentences and noisy bags.Existing research mainly focuses on the noisy sentences,ignoring the impact of other noise,and cannot completely eliminate the noise.To this end,the paper proposed a distant supervision relation extraction model based on multilevel attention mechanism and dynamic thresholding(MADT).The model firstly used a pre-trained language model to obtain entity-pair semantic representations,then obtained semantic features embedded with keyword information through a bidirectional gated recurrent unit and a self-attention mechanism,and then dealt with the three noise problems sequentially in conjunction with the deep contextual representation of the sentence.In addition,the paper proposed a dynamic thresholding method to further remove noisy sentences,enhance the contribution of positive example sentences to the bag representation,and reduce the impact of noisy bags using a semantic similarity-based attention mechanism.Experiments on the NYT10d and NYT10m datasets show that the MADT model is able to address all levels of noise in distant supervision of relation extraction and effectively improve relation extraction performance.

关 键 词:远程监督关系抽取 自注意力机制 动态阈值 预训练语言模型 降噪 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象