基于关键词共现分析的检索结果聚类研究  被引量:9

Study on Clustering of Retrieval Results Based on Co-occurrence Analysis of Keywords

在线阅读下载全文

作  者:李枫林[1] 何洲芳[1] 

机构地区:[1]武汉大学信息资源研究中心,武汉430072

出  处:《情报学报》2011年第8期819-825,共7页Journal of the China Society for Scientific and Technical Information

基  金:教育部人文社会科学重点研究基地重大项目(编号:07JJD870220)

摘  要:随着互联网规模的急剧扩张,提升信息检索的效用变得相当困难。本文首先通过特定算法提取每篇文档的关键词,然后运用统计方法计量不同文档的共现关键词并形成相应的共现关键词标签矩阵,最后利用层次聚类算法对共现关键词标签进行聚类并形成相应的层次标签树来构造文档聚类束。该方法可以对源搜索引擎返回的结果进行有效的分类,使用户在更高主题层次上查看检索词的相关信息,准确地找到感兴趣的信息。通过与Lingo算法的比较,显示本文算法所得的标签更具可读性和概括性,同时F-measure评价指标也表明本算法在文本聚类的质量上有了一定的提升。The continuous growth in the size of the Internet is creating difficulties for improving efficiency of information retrieval.First of all,this paper extracts the keywords from each document through a specific algorithm. Secondly,it has applied statistical techniques to measure the quantities of co-occurrence keywords for forming the label matrix of them,and finally agglomerated them into higher-level clusters by hierarchical clustering algorithm in order to classify the results which return from the source research engine.The view of retrieval results clustering can help the user quickly and efficiently navigate the results of a query at a topic level and locate the relevant information.Compared with Lingo,the experimental results show that the labels generated by our algorithm are of more readability and generality. What' s more,F-measure index also shows that our algorithm has improved the quality of text clustering to some extent.

关 键 词:共现 聚类 检索结果 

分 类 号:G254[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象