基于句法语义特征的实体关系抽取技术  被引量:3

Entity Relationship Extraction Technology based on Syntactic and Semantic Features

在线阅读下载全文

作  者:姚春华[1] 刘潇[2] 高弘毅 鄢秋霞 YAO Chun-hua;LIU Xiao;GAO Hong-yi;YAN Qiu-xia(No.30 Institute of CETC,Chengdu Sichuan 610041,China;Election and Training Office of PLA in XI'an University of Posts and Telecommunications,Xi'an Shaanxi 710016,China;China Electronic Technology Network Information Security Co.,Ltd.,Chengdu Sichuan 610041,China)

机构地区:[1]中国电子科技集团公司第三十研究所,四川成都610041 [2]解放军驻西安邮电大学选培办,陕西西安710061 [3]中国电子科技网络信息安全有限公司,四川成都610041

出  处:《通信技术》2018年第8期1828-1835,共8页Communications Technology

基  金:国家重点研发计划(No.2017YFC0820700)~~

摘  要:实体关系抽取将非结构化的数据转化为结构化的数据,是自然语言处理任务的重要基础。针对人与人之间的六种关系——父母、子女、夫妻、兄弟姐妹、同事、其他,在人与人之间六种关系语料库缺少的情况,采用百度百科的语料库构建五个类别(父母、子女、夫妻、兄弟姐妹、同事)的关系指示词词典,再根据关系指示词词典来判定实体对关系类型。采用上述方法,结合人工标定扩充五个类别(父母、子女、夫妻、兄弟姐妹、同事)语料库,根据中文的语法特点设计了一系列的特征,包括实体本身的词、词性标注以及实体上下文环境的词、词性特征。另外,融入实体的依存句法关系值、实体与核心谓词距离的特征,并构建二元实体对特征向量,采用logistic进行训练和测试。针对文本中含有多对二元实体对,通过统计文本中关系指示词的个数,使得句子中二元实体对不超过关系指示词的个数。实验结果证明,在人与人的关系识别中,准确率和召回率都可以达到87%。foundation Entity relationship extraction transforms unstructured data into structured data and is an important for natural language processing tasks. In view of the six relationships of between people, that is, parents, children, husband and wife, brothers and sisters, colleagues and others, and for absence of the sixrelated corpus, Baidu Encyclopedia corpus is used firstly to construct a five-category relationship (parents, children, couples, brothers and sisters, colleagues) indicator dictionary. Then based on the relationship pointer dictionary, the entity pair relationship type is determined. By using the above method and combined with manual calibration, the corpus of five categories (parents, children, couples, brothers and sisters, colleagues) is expanded, and in accordance with the grammatical features of Chinese, a series of features are designed, including the words of the entity itself, part-of-speech tagging, and the word and part-ofspeech features of the entity context. In addition, the characteristics of the entity's dependent syntax value, the distance of between the entity and the core predicate, are incorporated, and the binary entity pair feature vector constructed, trained and tested by using logistic. For that the text contains multiple pairs of binary entity pair and by counting the number of relationship indicators in the text, the binary entity pairs in the sentence do not exceed the number of relationship indicators. The experimental results indicate that the accuracy and recall rate can reach 87% in the relationship recognition of between people.

关 键 词:关系指示词词典 实体关系抽取 语义特征 句法依存关系值 LOGISTIC 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象