层次凝聚聚类算法的动态分析与准则函数设计  被引量:1

Dynamic analysis of hierarchical agglomerative clustering algorithm and design of criterion functions

在线阅读下载全文

作  者:王洋[1,2] 涂登彪[3] 安明远[1] 孙凝晖[2] 王伟平[2] 

机构地区:[1]中国科学院研究生院,北京100190 [2]中国科学院计算机系统结构重点实验室,北京100190 [3]国家计算机网络应急技术处理协调中心,北京100029

出  处:《高技术通讯》2012年第11期1169-1175,共7页Chinese High Technology Letters

基  金:863计划(2009AA01A129)和国家自然科学基金(60903047)资助项目.

摘  要:为提高层次凝聚聚类(HAC)算法的执行效率和结果质量,对其进行了动态分析,研究了一次合并对后续合并的影响。分析表明,合并两个类会生成一个新类,并使被合并的类的共享邻居的邻居数减小1;当新生成的类或邻居数减小的类参与后续合并时,会影响执行效率;一次合并会改变参与合并的类和它们的候选邻居之间的准则函数值,从而影响后续合并提高质量的程度。基于上述分析并结合模块性的定义,研究了现有准则函数对凝聚过程的影响以及它们的缺陷,并设计了两个新的准则函数。在大量数据集上的买验表明,新的准则函数提高了层次凝聚聚类算法的执行效率和结果质量。To improve the efficiency and the result quality of the hierarchical agglomerative clustering (HAC) algorithm, its dynamic analysis was conducted and the problem that how a merge influences the subsequent merges was stud- ied, with the conclusions below : merging two clusters generates a new cluster, and reduces the number of neighbors of the shared neighbors of the two clusters; The new cluster and those clusters whose number of neighbors are de- creased will be involved in the subsequent merges, and the efficiency will be influenced; A merge changes the val- ue of the criterion function over the involved clusters and their candidate neighbors, and thus influences the quality of the subsequent merges. Based on the above analyses and the definition of modularity, existing criterion func- tions' influence on agglomeration and their limitations were investigated, and two new criterion functions were de- signed. The results of the experiments conducted based on many datasets show that the new criterion functions can improve the efficiency and the result quality of the HAC algorithm.

关 键 词:层次凝聚聚类(HAC)算法 准则函数 模块性 聚类分析 

分 类 号:TP13[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象