一种CF树结合KNN图划分的文本聚类算法  被引量:5

New text clustering algorithm based on CF tree and KNN graph partition

在线阅读下载全文

作  者:仰孝富 齐建东[1] 吉鹏飞[1] 朱文飞[1] 

机构地区:[1]北京林业大学信息学院,北京100083

出  处:《计算机工程与应用》2015年第6期114-119,共6页Computer Engineering and Applications

基  金:十二五科技项目面向外文科技文献信息的知识组织体系建设与应用示范(No.2011BAH10B04);国家林业局重点项目

摘  要:为了提升文本聚类效果,改善传统聚类算法在参数设定,稳定性等方面存在的不足,提出新的文本聚类算法TCBIBK(a Text Clustering algorithm Based on Improved BIRCH and K-nearest neighbor)。该算法以BIRCH聚类算法为原型,聚类过程中除判断文本对象与簇的距离外,增加判断簇与簇之间的距离,采取主动的簇合并或分裂,设置动态的阈值。同时结合KNN分类算法,在保证良好聚类效率前提下提升聚类稳定性,将TCBIBK算法应用于文本聚类,能够提高文本聚类效果。对比实验结果表明,该算法聚类有效性与稳定性都得到较大提高。In order to improve the effect of text clustering, and to mend the flaws of traditional clustering algorithm in parameter setting and algorithm stability, a new text clustering algorithm TCBIBK(a Text Clustering algorithm Based on Improved BIRCH and K-nearest neighbor)is presented. TCBIBK uses BIRCH clustering algorithm as the prototype. During the process of clustering, besides analyzing the distance between text objects and clusters, TCBIBK also analyzes the distance between clusters and clusters, takes the active cluster merging or segmentation, and sets the dynamic threshold. Combined with KNN classification algorithm, TCBIBK improves the algorithm stability under the premise of ensuring the good efficiency of clustering. When applied to text clustering, TCBIBK can improve the text clustering effect. The results of comparative experiment shows that this algorithm can greatly improve the validity and stability of text clustering.

关 键 词:文本聚类 向量空间模型 传统的且非常高效的层次聚类算法(BIRCH) K最近邻 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象