科技文献资源中方法知识元的抽取研究  被引量:11

Extraction of Method Knowledge Elements in Scientific Literature

在线阅读下载全文

作  者:王忠义[1] 沈雪莹 黄京[2] WANG Zhong-yi;SHEN Xue-ying;HUANG Jing(School of Information Management,Central China Normal University,Wuhan 430019,China;Wuhan Polytechnic,Wuhan 430072,China)

机构地区:[1]华中师范大学信息管理学院,湖北武汉430079 [2]武汉职业技术学院,湖北武汉430072

出  处:《情报科学》2021年第1期13-20,共8页Information Science

基  金:教育部人文社会科学研究青年基金“大数据环境下碎片化用户生成内容的多粒度知识组织研究”(19YJC870025)。

摘  要:【目的/意义】为准确抽取科技文献中的方法知识元,实现科技文献更细粒度知识组织和检索。【方法/过程】本研究提出一种基于规则的方法知识元抽取方法,该方法主要分为两个阶段:方法知识元初始描述规则半自动化识别阶段和方法知识元及其描述规则自动化抽取和更新阶段。第一阶段根据方法知识元的特征,以人工—机器相结合的方法识别方法知识元的组成维度及初始描述规则。第二阶段依据第一阶段识别的方法知识元初始描述规则,自动从科技文献中提取方法知识元,并基于PreFixSpan算法从新识别的方法知识元中挖掘出新的方法知识元描述规则,以实现方法知识元及其描述规则的动态更新。【结果/结论】在对16篇科技文献的初步评估中,实验结果P、R以及F值分别为0.71、0.80和0.73(均>0.5)表明该方法的可行性和有效性,该抽取方法对更细粒度的知识组织和检索也有一定借鉴作用。【创新/局限】方法的局限性在于需要一定的人工参与方法知识元描述规则的提取。【Purpose/significance】In order to accurately extract the method knowledge elements(KEs)in scientific literature and achieve more granular knowledge organization and retrieval.【Method/process】This study proposes a rule-based method for extracting method KEs in scientific literature.The method is divided into two stages:Semi-automated extraction stage of initial description rules of method KEs and automated derivation and update stage of method KEs along with their additional description rules.The former semi-automatically extracts initial method KEs based on the description characteristics of method KEs to get high-quality method KEs,and summarizes the composition dimensions and initial description rules finally.This stage provides the data foundation for the next stage,and also provides further insights into the composition dimensions of method KEs.The latter regards the initial rules as clue words,and uses regular expressions to extract the method KEs from text,and then derives additional rules by the PreFixSpan algorithm to supplement the initial rules.【Result/conclusion】In a preliminary evaluation on 16 papers,the P,R and F for the method KEs extraction are 0.71,0.80 and 0.73(>0.5)respectively,indicating the effectiveness of the method,and the method has certain reference effect for more granular knowledge organization and retrieval.【Innovation/limitation】The limitation of the method lies in the need of manual intervention in the extraction of the method knowledge elements description rules.

关 键 词:科技文献 方法知识元 描述规则 自动抽取 PREFIXSPAN 

分 类 号:G254[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象