检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李鹏[1] 王斌[1] 石志伟[1] 崔雅超[1] 李恒训[1]
出 处:《计算机研究与发展》2012年第11期2344-2351,共8页Journal of Computer Research and Development
基 金:国家自然科学基金项目(60776797;60873166);国家"九七三"重点基础研究发展计划基金项目(2007CB311103);国家"八六三"高技术研究发展计划基金项目(2006AA010105)
摘 要:关键词抽取是从文本中抽取代表性关键词的过程,在文本处理领域中具有重要的应用价值.利用一种近年来受到广泛关注的新的信息源——社会化标签(tag)——来提高网页关键词抽取的质量.通过对Tag数据进行统计分析,发现用户往往对多个在话题上相关的网页使用同样的标签词,一个特定的文档可以通过其标注信息找到相关文档.在此基础上,提出了利用Tag进行关键词抽取的框架,并给出了一种具体的实现方法Tag-TextRank.该方法在TextRank基础上,通过目标文档中的每个Tag引入相关文档来估计词项图的边权重并计算得到词项的重要度,最后将不同Tag下的词项权重计算结果进行融合.在公开语料上的实验表明,Tag-TextRank在各项评价指标上均优于经典的关键词抽取方法TextRank,并具有很好的推广性.Keyword extraction is to extract representative keywords from texts and has been widely used in most text processing applications. In this paper, we explore the use of tags for improving the performance of webpage keyword extraction task. Specifically, we first analyze the characteristics of bookmarking behavior and find that people usually use the same tags to label multiple topic-related webpages, which is shown by the fact that over 90~ of labeled webpages can find relevant webpages through their tag information. Based on the discovery, we propose a method called Tag-TextRank. As an extension of the classic keyword extraction method TextRank, Tag-TextRank calculates the term importance based on a weighted term graph and the edge weight for a term pair is estimated by the statistics of the relevant documents which are introduced by a certain tag of the target webpage. The final importance score for a term is the combination of the above tag dependent importance scores. Tag-TextRank can measure the term relations by utilizing more documents so as to better estimate the term importance. Experimental results on a publicly available corpus show that Tag- TextRank outperforms TextRank on various metrics.
关 键 词:社会化标注 标签 关键词抽取 网页关键词抽取 TextRank
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222