检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱丽萍[1,2] 李洪奇[1,2] 杨中国[1,2] 刘蔷[1,2]
机构地区:[1]中国石油大学(北京)石油数据挖掘北京市重点实验室,北京102249 [2]中国石油大学(北京)地球物理与信息工程学院,北京102249
出 处:《山东大学学报(理学版)》2015年第7期23-30,37,共9页Journal of Shandong University(Natural Science)
基 金:中国石油大学(北京)基金资助项目(KYJJ2012-05-25);国家重大科技专项(2011ZX05023-005-06;2011ZX05020-007-007)
摘 要:分析了引言部分写作模型,将文本按照句子级别划分为背景知识、问题分析、工作描述三个类别。统计每个部分句子的引导词、句型表达、线索词、所处位置的特征,并构建相应规则库。在分词和词性标注基础上,利用规则匹配每个句子得出所属的类别,从而抽取出三个部分的信息。以石油勘探开发类科技文献和数据挖掘类科技文献为例,进行人工判别和本文方法抽取试验,结果表明本文方法能准确获取相应信息。The introduction of the scientific literature could be classified as three categories: background knowledge, problem analysis and work description based on analyses of write model. Each part of the three categories could be depicted by guide words, sentence structure, clue words and sentence position. These features of sentence were used to construct a rule which could distinguish the type of sentences. A rule bank was generated by features extracted from a mount of scientific article sentences. The information of the tree categories could be extracted by simply matching the three types of rules. A text information extraction experiment was studied in the fields of petroleum exploration and data mining, in which the automatically extracted result was compared to human work. The result shows that all three types of information could be extracted effectively.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222