面向中文专利的开放式实体关系抽取研究  被引量:5

Research on Chinese-patents-oriented open entity relation extraction

在线阅读下载全文

作  者:赵奇猛 王裴岩[1] 冯好国 蔡东风[1] 

机构地区:[1]沈阳航空航天大学知识工程研究中心,沈阳110136

出  处:《计算机工程与应用》2015年第1期125-129,171,共6页Computer Engineering and Applications

基  金:国家"十二五"科技支撑计划项目(No.2012BAH14F00);国家自然科学基金(No.61073123)

摘  要:针对传统实体关系抽取需要预先指定关系类型和制定抽取规则等无法胜任大规模文本的情况,开放式信息抽取(Open Information Extraction,OIE)在以英语为代表的西方语言中取得了重大进展,但对于汉语的研究却显得不足。为此,研究了在组块层次标注基础上应用马尔可夫逻辑网分层次进行中文专利开放式实体关系抽取的方法。实验表明:以组块为出发点降低了对句子理解的难度,外层和内层组块可以统一处理,减少了工程代价;而且在相同特征条件下与支持向量机相比,基于马尔可夫逻辑网的关系抽取效果更理想,外层和内层识别结果的F值分别可达到77.92%和69.20%。The main goal of information extraction is to transform unstructured or semi-structured texts into structured information, in which entity relation extraction is a major task. In general, traditional methods require pre-specified relation types. But pre-defined rules and manual labels are not adaptive to massive texts. Recently, open information extraction can solve the problems properly. In contrast with the significant achievements concerning English and other Western languages, research on Chinese open relation extraction is quite scarce. The hierarchical Chinese open entity relation extraction approach is proposed that applies Markov Logic Networks (MLN) on the base of both extemal and internal chunk-tags. The experimental results reveal that the origin of chunks can simplify the understanding of sentences, and both layers can be handled consistently so that engineering efforts are reduced. And on the same conditions, MLN can perform better than SVM, in which the F-score of external and intemal layers can reach 77.92% and 69.20% respectively.

关 键 词:中文专利依存树库 开放式实体关系抽取 MARKOV逻辑网 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象