一种面向科技文献引言的信息抽取方法  被引量:6

An information extraction method for scientific literature introduction

在线阅读下载全文

作  者:朱丽萍[1,2] 李洪奇[1,2] 杨中国[1,2] 刘蔷[1,2] 

机构地区:[1]中国石油大学(北京)石油数据挖掘北京市重点实验室,北京102249 [2]中国石油大学(北京)地球物理与信息工程学院,北京102249

出  处:《山东大学学报(理学版)》2015年第7期23-30,37,共9页Journal of Shandong University(Natural Science)

基  金:中国石油大学(北京)基金资助项目(KYJJ2012-05-25);国家重大科技专项(2011ZX05023-005-06;2011ZX05020-007-007)

摘  要:分析了引言部分写作模型,将文本按照句子级别划分为背景知识、问题分析、工作描述三个类别。统计每个部分句子的引导词、句型表达、线索词、所处位置的特征,并构建相应规则库。在分词和词性标注基础上,利用规则匹配每个句子得出所属的类别,从而抽取出三个部分的信息。以石油勘探开发类科技文献和数据挖掘类科技文献为例,进行人工判别和本文方法抽取试验,结果表明本文方法能准确获取相应信息。The introduction of the scientific literature could be classified as three categories: background knowledge, problem analysis and work description based on analyses of write model. Each part of the three categories could be depicted by guide words, sentence structure, clue words and sentence position. These features of sentence were used to construct a rule which could distinguish the type of sentences. A rule bank was generated by features extracted from a mount of scientific article sentences. The information of the tree categories could be extracted by simply matching the three types of rules. A text information extraction experiment was studied in the fields of petroleum exploration and data mining, in which the automatically extracted result was compared to human work. The result shows that all three types of information could be extracted effectively.

关 键 词:科技文献 信息抽取 背景知识 线索词 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象