面向半监督聚类的最优间隔分布学习机  被引量:1

Optimal margin distribution machine for semi-supervised clustering

在线阅读下载全文

作  者:张腾[1,2,3,4] 黎铭 金海[1,2,3,4] Teng ZHANG;Ming LI;Hai JIN(National Engineering Research Center for Big Data Technology and System,Huazhong University of Science and Technology,Wuhan 430074,China;Service Computing Technology and System Lab,Huazhong University of Science and Technology,Wuhan 430074,China;Cluster and Grid Computing Lab,Huazhong University of Science and Technology,Wuhan 430074,China;School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China;National Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China)

机构地区:[1]华中科技大学大数据技术与系统国家地方联合工程研究中心,武汉430074 [2]华中科技大学服务计算技术与系统教育部重点实验室,武汉430074 [3]华中科技大学集群与网格计算湖北省重点实验室,武汉430074 [4]华中科技大学计算机科学与技术学院,武汉430074 [5]南京大学计算机软件新技术国家重点实验室,南京210023

出  处:《中国科学:信息科学》2022年第1期86-98,共13页Scientia Sinica(Informationis)

基  金:国家自然科学基金(批准号:62006088,62076121)资助项目。

摘  要:基于间隔的聚类是一类经典的聚类算法,此类算法假设聚类结构能通过引入监督学习中的间隔来确定.即一个好的聚类结果,当以其簇标记作为类别标记进行监督学习时,所得分类器产生的关于间隔的目标物理量也同时达到最优.目前最为有效的间隔物理量是间隔分布,其基于最新的间隔理论,取得了比优化最小间隔更好的效果.然而在现实聚类任务中,我们往往还能获得一些额外的监督信息,例如两两样本之间的"必连"约束和"勿连"约束,此时优化间隔分布是否还有效尚未可知.对此,本文提出面向半监督聚类的最优间隔分布学习机(ODMSSC),对该问题进行初步探索. ODMSSC对应的形式化是一个混合整数规划,我们将其放松成一个鞍点问题,并提出一种高效的交替优化方法进行求解.最终通过真实数据集上的实验,我们验证了所提算法的有效性.Margin-based clustering is one of the most classical clustering algorithms, which assumes that the best clustering structure can be determined by introducing margin used in supervised learning. That is for a satisfactory clustering result, when used as labels for supervised learning, some margin-related statistics produced by the obtained classifier can simultaneously be optimal. Currently, the most optimal statistic is the margin distribution,which bases on the latest margin theory and has achieved better results than optimizing the minimum margin.However, in some real clustering tasks, there is extra supervised information available such as the “must-link”and “cannot-link” constraints between a pair of instances, and the effectiveness of optimizing margin distribution in these circumstances has not been well exploited. In this paper, we propose an optimal margin distribution machine for semi-supervised clustering(ODMSSC), whose formulation is mixed-integer programming. We adopt the minimax convex relaxation to convert it into a saddle point problem, and propose an efficient alternating optimization method to solve the problem. Extensive experiments on real data sets also verify the superiority of the proposed method.

关 键 词:半监督聚类 约束聚类 最优间隔分布学习机 间隔分布 间隔 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] TP311.13[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象