检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贾真[1] 冶忠林[1] 尹红风 何大可[1] JIA Zhen YE Zhonglin YIN Hongfeng HE Dake(School of Information and Science Technology, Southwest Jiaotong University, Chengdu, Sichuan 610031, China DOCOMO Innovations Inc. ,Palo Alto 94304, USA)
机构地区:[1]西南交通大学信息科学与技术学院,四川成都610031 [2]DOCOMO Innovations公司,美国帕罗奥图94304
出 处:《中文信息学报》2016年第4期142-149,158,共9页Journal of Chinese Information Processing
基 金:国家自然科学基金(61170111,61202043,61262058)
摘 要:弱监督关系抽取利用已有关系实体对从文本集中自动获取训练数据,有效解决了训练数据不足的问题。针对弱监督训练数据存在噪声、特征不足和不平衡,导致关系抽取性能不高的问题,文中提出NF-Tri-training(Tritraining with Noise Filtering)弱监督关系抽取算法。它利用欠采样解决样本不平衡问题,基于Tri-training从未标注数据中迭代学习新的样本,提高分类器的泛化能力,采用数据编辑技术识别并移除初始训练数据和每次迭代产生的错标样本。在互动百科采集数据集上实验结果表明NF-Tri-training算法能够有效提升关系分类器的性能。Weakly supervised relation extraction utilizes entity pairs to obtain training data from texts automatically,which can effectively deal with the problem of inadequate training data.However,there are many problems in the weakly supervised training data such as noise,inadequate features,and imbalance samples,leading to low performance of relation extraction.In this paper,a weakly supervised relation extraction algorithm named NF-Tri-training(Tri-training with Noise Filtering)is proposed.NF-Tri-training employs an under-sampling approach to solve the problem of imbalance samples,learns new samples iteratively from unlabeled data and uses a data editing technique to identify and discard possible mislabeled samples both in initial training data and in new samples generating at each iteration.The experiment on dataset of Hudong encyclopedia indicates the proposed method can improve the performance of relation classifiers.
关 键 词:关系抽取 弱监督学习 TRI-TRAINING 数据编辑
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.70