检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱泽德[1,2] 李淼[2] 张健[2] 曾伟辉[2] 曾新华[2]
机构地区:[1]中国科学技术大学自动化系,安徽合肥230026 [2]中国科学院合肥智能机械研究所,安徽合肥230031
出 处:《中南大学学报(自然科学版)》2015年第6期2142-2148,共7页Journal of Central South University:Science and Technology
基 金:模式识别国家重点实验室开放课题基金资助项目(201306320);中国科学院信息化专项(XXH12504-1-10);国家自然科学基金资助项目(61070099)~~
摘 要:为解决现有方法未能综合考察文档主题的全面性、关键词的可读性以及差异性,提出一种基于文档隐含主题的关键词抽取新算法TFITF。算法根据大规模语料产生隐含主题模型计算词汇对主题的TFITF权重并进一步产生词汇对文档的权重,利用共现信息排序和选择相邻词汇形成候选关键短语,再使用相似性排除隐含主题一致的冗余短语。此外,从文档统计信息、词汇链和主题分析3方面来进行关键词抽取的对比测试,实验在1 040篇中文摘要及5 408个关键词构成的测试集上展开。结果表明,算法有效地提高文档关键词抽取的准确率与召回率。Due to the shortage of the comprehensive analysis of the coverage of document topics, the readability and difference of keyphrases, a new algorithm of keyphrase extraction TFITF based on the implicit topic model was put forward. The algorithm adopted the large-scale corpus and producted latent topic model to calculate the TFITF weight of vocabulary on the topic and further generate the weight of vocabulary on the document. And adjacent lexical was ranked and picked out as candidate keyphrases based on co-occurrence information. Then according to the similarity of vocabulary topics, redundant phrases were eliminated. In addition, the comparative experiments of candidate keyphrases were executed by document statistical information, vocabulary chain and topic information. The experimental results, which were carried out on an evaluation dataset including 1 040 Chinese documents and 5 408 standard keyphrases, demonstrate that the method can effectively improve the precision and recall of keyphrase extraction.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28