基于约束动态更新的半监督层次聚类算法  被引量:20

A Semi-supervised Agglomerative Hierarchical Clustering Method Based on Dynamically Updating Constraints

在线阅读下载全文

作  者:周晨曦[1] 梁循[1] 齐金山[1,2] 

机构地区:[1]中国人民大学信息学院,北京100872 [2]淮阴师范学院计算机科学与技术学院,淮安223300

出  处:《自动化学报》2015年第7期1253-1263,共11页Acta Automatica Sinica

基  金:国家自然科学基金(71271211);北京市自然科学基金(4132067);中国人民大学品牌计划(10XNI029)资助~~

摘  要:提出了一种基于约束动态更新的半监督层次聚类算法.与现存的半监督层次聚类算法类似,该算法也使用了必连和不连约束.但不同的是,该算法并不是在对满足必连约束的数据样本点进行预先划分的基础上依据不连约束进行聚合操作,而是首先将约束扩展为一个闭包,然后在这此基础上直接依据不连约束进行聚合操作,并在聚合的过程中依据聚类结果动态地更新必连和不连约束,以保证最终的聚类结果同时满足必连和不连约束.该算法的优势在于省略了对必连约束的数据样本点进行预先划分的步骤,这一改进能够保证数据样本点获得更为合理的聚合顺序,从而得到更为准确的聚类结果.本文具体给出了该算法基于Ward层次聚类算法的实现,提出了C-Ward算法.实验表明,与其他同类算法相比,无论是在人工模拟数据集还是在现实数据集上,本文提出的算法都表现出了更高的准确性和更强的稳定性.A semi-supervised agglomerative hierarchical clustering method based on dynamically updating constraints is proposing in this research. Following the existing semi-supervised clustering algorithm, this method uses the must-link and cannot-link constraints. Instead of using the idea that the instances with must-link constraints are pre-clustered before agglomerating with the others, this method employs a more general and reasonable process. Firstly, must-link and cannot-link constraints are expanded to compose a constraints closure. Then, a standard agglomeration instructed by cannot-link constraints is processed. During this procedure, the must-link and cannot-link are dynamically updated according to the intermediate clustering results. This updating process guarantees the validity of the final results. The fundamental advantage of this method is omitting the pre-clustering process of the instances with must-link constraints. This modification ensures that data points gain a more reasonable agglomeration order, which may result in a significant improvement on the clustering results. This research also introduces an implementation of this model based on Ward's method, leading to the C-Ward algorithm. The experimental analyses on both artificial simulated datasets and real world datasets show that this method is much better than the others.

关 键 词:半监督聚类 层次聚类 约束 动态更新 Ward算法 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象