检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《情报学报》2011年第8期819-825,共7页Journal of the China Society for Scientific and Technical Information
基 金:教育部人文社会科学重点研究基地重大项目(编号:07JJD870220)
摘 要:随着互联网规模的急剧扩张,提升信息检索的效用变得相当困难。本文首先通过特定算法提取每篇文档的关键词,然后运用统计方法计量不同文档的共现关键词并形成相应的共现关键词标签矩阵,最后利用层次聚类算法对共现关键词标签进行聚类并形成相应的层次标签树来构造文档聚类束。该方法可以对源搜索引擎返回的结果进行有效的分类,使用户在更高主题层次上查看检索词的相关信息,准确地找到感兴趣的信息。通过与Lingo算法的比较,显示本文算法所得的标签更具可读性和概括性,同时F-measure评价指标也表明本算法在文本聚类的质量上有了一定的提升。The continuous growth in the size of the Internet is creating difficulties for improving efficiency of information retrieval.First of all,this paper extracts the keywords from each document through a specific algorithm. Secondly,it has applied statistical techniques to measure the quantities of co-occurrence keywords for forming the label matrix of them,and finally agglomerated them into higher-level clusters by hierarchical clustering algorithm in order to classify the results which return from the source research engine.The view of retrieval results clustering can help the user quickly and efficiently navigate the results of a query at a topic level and locate the relevant information.Compared with Lingo,the experimental results show that the labels generated by our algorithm are of more readability and generality. What' s more,F-measure index also shows that our algorithm has improved the quality of text clustering to some extent.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7