检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国科学院国家科学图书馆,北京100190 [2]中国科学院大学,北京100049
出 处:《现代图书情报技术》2014年第3期73-79,共7页New Technology of Library and Information Service
基 金:国家科技支撑计划子课题"基于文献知识网络的领域学术关系研究与示范"(项目编号:2011BAH10B06-04)的研究成果之一
摘 要:【目的】针对学术论文大纲内容精炼、层次性的特点,研究从中抽取重要且具有实质意义术语的方法。【方法】结合语言学规则和术语词典从大纲各级标题中识别出候选术语集,然后根据术语间的句法依存关系计算tf-idf,并利用大纲结构量化术语层级特征,最后结合tf-idf与层级特征对候选术语进行排名,选择出关键术语。【结果】实验证明,该方法的候选术语识别F值达到89.57%,术语选择F值达到36.89%。【局限】采用的术语抽取规则不完备,且tf-idf计算过程中的权值设置仅使用经验值,导致未能达到最优效果。【结论】该方法能有效抽取大纲中的关键术语,适用于层级结构中的关键术语抽取。[Objective] According to the succinct and hierarchical character of scholarly article outlines, this paper concentrates on finding a method to extract important and meaningful phrases from the outlines. [Methods] This paper first adopts a combined method of linguistic rules and terminology dictionaries to identify the candidate phrases. Then, it calculates tf-idf based on syntactic dependencies between phrases, and quantifies the hierarchical feature according to hierarchical structure of outline. At last, it combines the tf-idf and the hierarchical feature to rank candidate phrases, and selects the keyphrases. [Results] Experiments show that the F-score of the candidate phrases identification reaches 89.57%, and the F-score of candidate phrases selection reaches 36.89%. [Limitations] In this method, the inadequate phrase extraction rules and the empirical values involved in weight setting during tf-idf calculation lead to non-optimal effect. [Conclusions] This method can effectively extract the keyphrase from outlines, and is suitable for keyphrase extraction from hierarchical structure.
关 键 词:候选术语识别 候选术语选择 句法依存关系 层级特征
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.224.64.24