DMKK-means——一种深度多核K-means聚类算法  

DMKK-means: a deep multiple kernel K-means clustering algorithm

在线阅读下载全文

作  者:王梅[1,2] 宋凯文 刘勇 王志宝[1] 万达[1] WANG Mei;SONG Kaiwen;LIU Yong;WANG Zhibao;WAN Da(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,Heilongjiang,China;Heilongjiang Key Laboratory of Petroleum Big Data and Intelligent Analysis,Daqing 163318,Heilongjiang,China;Gaoling School of Artificial Intelligence,Renmin University of China,Beijing 100049,China;Beijing Key Laboratory of Big Data Man-agement and Analysis Method(School of Information,Renmin University of China),Beijing 100049,China)

机构地区:[1]东北石油大学计算机与信息技术学院,黑龙江大庆163318 [2]黑龙江省石油大数据与智能分析重点实验室,黑龙江大庆163318 [3]中国人民大学高瓴人工智能学院,北京100049 [4]大数据管理与分析方法研究北京市重点实验室(中国人民大学信息学院),北京100049

出  处:《山东大学学报(工学版)》2024年第6期1-7,18,共8页Journal of Shandong University(Engineering Science)

基  金:国家自然科学基金资助项目(51774090,62076234);黑龙江省博士后科研启动金资助项目(LBH-Q20080);黑龙江省自然科学基金资助项目(LH2020F003);黑龙江省高校基本科研业务费资助项目(KYCXTD201903,YYYZX202105)。

摘  要:针对传统K-means的聚类效果容易受到样本分布影响,且核函数表示能力不强导致对于复杂问题的聚类效果表现不佳的问题,利用深度核的强表示性并通过多核集成方式,提出一种具有强表示能力且分布鲁棒的深度多核K-means(deep multiple kernel K-means, DMKK-means)聚类算法。构建具有强表示能力的深度多核网络架构,在新的特征空间进行K-means聚类;基于Kullback-Leibler(KL)散度的聚类损失函数衡量该算法与2种基准聚类方法的差异;将该聚类算法建模成高效的端到端学习问题,利用随机梯度下降算法更新优化深度多核网络的权重参数。在多个标准数据集上进行试验,结果表明,相比于K-means、径向基函数核K-means(radial basis function kernel K-means, RBFKKM)及其他多核K-means聚类算法,该算法在聚类精度、归一化互信息和调整兰德系数指标上均有明显提升,验证该算法的可行性与有效性。The proposed algorithm,deep multiple kernel K-means(DMKK-means),addressed the limitations of traditional K-means clustering,which was sensitive to sample distribution and exhibited suboptimal performance for complex problems due to its limited expressive power of kernel representations.By leveraging the strong representational capability of deep kernels and employing a multi-kernel ensemble approach,DMKK-means constructed a highly expressive deep multiple kernel network architecture and per-formed K-means clustering in a new feature space.The dissimilarity between this algorithm and two baseline clustering methods was quantified using a clustering loss function based on Kullback-Leibler(KL)divergence.The clustering algorithm was modeled as an efficient end-to-end learning problem,and the weight parameters of the deep multiple kernel network were optimized through sto-chastic gradient descent.Experimental results on multiple standard datasets demonstrated the superiority of the proposed algorithm over K-means,radial basis function kernel K-means(RBFKKM),and other multi-kernel K-means clustering algorithms in terms of clustering accuracy,normalized mutual information,and adjusted rand index.These findings validated the feasibility and effectiveness of the proposed algorithm.

关 键 词:K-MEANS 核聚类 深度多核学习 数据挖掘 梯度下降 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象