ISTC: A New Method for Clustering Search Results 被引量：2

ISTC: A New Method for Clustering Search Results

作　　者：ZHANG Wei XU Baowen ZHANG Weifeng XU Junling

机构地区：[1]School of Computer Science and Engineering, Southeast University, Nanjing 211189, Jiangsu, China [2]State Key Laboratory of Software Engineering, WuhanUniversity, Wuhan 430072, Hubei, China [3]Department of Computer, Nanjing University of Posts andTelecommunications, Nanjing 210003, Jiangsu, China

出　　处：《Wuhan University Journal of Natural Sciences》2008年第4期501-504,共4页武汉大学学报（自然科学英文版）

基　　金：Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086);Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow Uni-versity (KJS0714);Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082);National Natural Science Foundation of Jiangsu (BK2006094).

摘　　要：A new common phrase scoring method is proposed according to term frequency-inverse document frequency （TFIDF） and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm （STC） is named as improved suffix tree clustering （ISTC）. To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.A new common phrase scoring method is proposed according to term frequency-inverse document frequency （TFIDF） and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm （STC） is named as improved suffix tree clustering （ISTC）. To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.

关键词：Web search results clustering suffix tree term frequency-inverse document frequency （TFIDF） independence of phrases

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

ISTC: A New Method for Clustering Search Results 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

ISTC: A New Method for Clustering Search Results 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索