结合力导向图分布算法的特征加权深度嵌入聚类  

Deep Embedding Clustering Combining Force-directed Graph Distribution and Feature Weighting Idea

在线阅读下载全文

作  者:吕维 钱宇华[1,2,3] 王婕婷 李飞江 胡深[1,3] Lü Wei;QIAN Yuhua;WANG Jieting;LI Feijiang;HU Shen(School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;Key Laboratory of Computational Intelligence and Chinese Information Processing,Shanxi University,Taiyuan 030006,China;Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China)

机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]计算智能与中文信息处理教育部重点实验室,太原030006 [3]山西大学大数据科学与产业研究院,太原030006

出  处:《小型微型计算机系统》2024年第6期1318-1324,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金重点项目(62136005)资助;国家自然科学基金青年科学基金项目(62106132)资助;国家重点研发计划项目(2021ZD0112400)资助;山西省基础研究计划项目(20210302124271,202103021223026)资助。

摘  要:聚类分析作为无监督学习领域的一个重要研究方向,是许多数据驱动应用的核心.但是高维数据特有的高维距离趋同特性,使得高维空间样本近邻结构遭到破坏,从而使得大量基于距离(基于近邻)的聚类算法性能急剧下降.目前,大量研究者认为,高维数据往往包含大量与任务不相关特征及相互关联的特征,其真实特征维度往往要比原始特征维度低很多.在学习样本低维等价表示上,基于深度自编码器的深度嵌入学习尽可能地保留重构信息.然而,现有此类方法往往需要聚类损失引导聚类,这虽然提高了聚类性能,但聚类损失与重构损失间的内在矛盾,限制了聚类性能的进一步提高.基于力导向图分布算法的降维算法则是尽可能保留近邻结构信息的基础上学习样本低维表示,但是高维距离趋同的特性使得此类算法较难准确获取样本高维近邻结构信息.本文在深度自编码器与力导向图分布算法的基础上引入特征加权思想,使模型在具有强大的低维等价表示能力及根据数据近邻结构凸显簇结构能力的同时考虑特征对聚类任务的适合程度.5个数据集上与最新高维聚类算法的对比实验充分证明了本文算法的合理性与优越性.Cluster analysis,as an important research direction in the field of unsupervised learning,is the core of many data-driven applications.However,the high-dimensional distance convergence characteristic of high-dimensional data has led to the violation of the neighbor structure,which makes the performance of a large number of distance-based(neighbor-based)clustering algorithms drop sharply.At present,a large number of researchers believe that high-dimensional data often contain a large number of task-irrelevant features and inter-related features,and their real feature dimensionality is often much lower than the original feature dimensionality.To learn the low-dimensional equivalent representation of samples,deep embedding learning based on deep auto-encoders preserves the reconstruction information as much as possible.However,existing such methods often require clustering loss,which improves the clustering performance but the inherent conflict between clustering loss and reconstruction loss limits the further improvement of clustering performance.The force-guided graph distribution algorithm-based dimensionality reduction algorithm learns the low-dimensional representation of the sample while preserving the neighbor structure information as much as possible,but the convergence of high-dimensional distances makes it more difficult to accurately obtain the high-dimensional neighbor structure information.This paper introduces the idea of feature weighting based on deep self-encoders and force-directed graph distribution algorithms,so that the model can consider the suitability of features for clustering tasks while having a strong low-dimensional equivalent representation and the ability to highlight cluster structure based on data nearest-neighbor structure.The comparison experiments with the latest high-dimensional clustering algorithms on five data sets fully demonstrate the reasonableness and superiority.

关 键 词:高维聚类 深度自编码器 特征加权 力导向图分布算法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象