基于自适应损失函数的句子级远程监督关系抽取  被引量:1

Sentence-level distant supervision relation extraction based on self-adaptive loss function

在线阅读下载全文

作  者:胡峰[1] 杨新瑞 汤成富 邓维斌[1] 刘群[1] HU Feng;YANG Xinrui;TANG Chengfu;DENG Weibin;LIU Qun(Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)

机构地区:[1]重庆邮电大学计算智能重庆市重点实验室,重庆400065

出  处:《智能系统学报》2024年第3期697-706,共10页CAAI Transactions on Intelligent Systems

基  金:国家重点研发计划项目(2018YFC0832102);重庆市教委重点合作项目(HZ2021008);重庆市自然科学基金项目(cstc2021jcyj-msxmX0849).

摘  要:远程监督关系抽取是一种关系抽取方法,现有方法主要采用多实例学习,在具有相同实体对的样例包上进行关系抽取。但是,包级方法只能缓解却并不能完全解决错误标签问题。基于此,文中首先分析了干净数据和噪声数据的分布,提出了一种新的自适应损失函数;在此基础上,提出了一种基于自适应损失函数的句子级远程监督关系抽取方法。在公开数据集NYT-10以及基于TACRED的合成数据集上的实验结果表明:文中提出的方法优于对比文献中的方法,能够更有效地区分错误标签噪声样例和干净样例,提高了句子级远程监督关系抽取的准确率。Distant supervision relation extraction is a kind of relation extraction method.The existing methods,which mainly employ multi-instance learning and relation extraction,are conducted in the sample bag that contains the same entity pair.However,the bag-level method can only alleviate but cannot completely solve the problem of wrong labeling.Therefore,herein,the distribution of clean data and noise data is analyzed,proposing a new self-adaptive loss function.On this basis,a method for sentence-level distant supervision relation extraction based on self-adaptive loss function is given.The experimental results obtained on the public dataset NYT-10 and the TACRED-based synthetic dataset show that the proposed method is better than that given in the compared studies.It can distinguish the wrongly labeled noise samples from the clean samples more effectively,improving the accuracy of sentence-level distant supervision relation extraction.

关 键 词:自然语言处理 信息抽取 关系抽取 远程监督 噪声分离 噪声标注 负训练 自适应损失函数 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象