基于远程监督的多因子人物关系抽取模型  被引量:10

Multi-factor person entity relation extraction model based on distant supervision

在线阅读下载全文

作  者:黄杨琛 贾焰[1] 甘亮[1] 徐菁[1] 黄九鸣[1] 赫中翮 HUANG Yangchen;JIA Yan;GAN Liang;XU Jing;HUANG Jiuming;HE Zhonghe(College of Computer,National University of Defense Technology,Changsha 410073,China;KB R&D department,Hunan Singhand Intelligent Data Technology Co.,Ltd.,Changsha 410205,China)

机构地区:[1]国防科技大学计算机学院,湖南长沙410073 [2]湖南星汉数智科技有限公司知识图谱研发部,湖南长沙410205

出  处:《通信学报》2018年第7期103-112,共10页Journal on Communications

基  金:国家重点研究发展计划基金资助项目(No.2016QY03D0601;No.2016QY03D0603);国家自然科学基金资助项目(No.61502517);湖南省重点研发计划基金资助项目(No.2018GK2056)~~

摘  要:针对远程监督的基本假设过强容易引入噪声数据的问题,提出了一种可以对远程监督自动生成的训练数据去噪的人物实体关系抽取模型。在训练数据生成阶段,通过多示例学习的思想和基于TF-IDF的关系指示词发现的方法对远程监督产生的数据进行去噪处理,使训练数据达到人工标注质量。在模型分类器中,提出采用词法特征和句法特征相结合的多因子特征作为关系特征向量用于分类器的学习。在大规模真实数据集上的实验结果表明,所提模型结果优于同类型的关系抽取方法。Aiming at the problem that the basic assumption of distant supervision was too strong and easy to produce noise data,a model of the person entity relation extraction which could automatically filter the training data generated by distant supervision was proposed.For training data generation,the data produced by distant supervision would be filtered by multiple instance learning and the method of TF-IDF-based relation keyword detecting,which tried to make the training data has the manual annotation quality.Furthermore,the model combined lexical and syntactic features to extract the effective relation feature vector from two angles of words and semantics for classifier.The experiment results on large scale real-world datasets show that the proposed model outperforms other relation extraction methods which based on distant supervision.

关 键 词:关系抽取 人物关系 远程监督 机器学习 自然语言处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象