检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:薛丽香[1] 高丽杰[1] 李占波[2] XUE Li-xiang;GAO Li-jie;LI Zhan-bo(College of Information Engineering,Zhengzhou University of Science and Technology,Zhengzhou Henan 450064,China;Network Management Center,Zhengzhou University,Zhengzhou Henan 450001,China)
机构地区:[1]郑州科技学院信息工程学院,河南郑州450064 [2]郑州大学网络管理中心,河南郑州450001
出 处:《计算机仿真》2024年第3期542-547,共6页Computer Simulation
摘 要:随着科技信息的不断发展,数据量与数据类型与日俱增,针对数据集维度高、重复数据多导致有效信息提取复杂的问题,提出基于改进稀疏自编码器的多维数据聚类算法。算法分为数据处理与聚类分析两大部分,数据处理时首先利用S-SAE中逐层贪婪的原理将高维数据集降维至每组6维的数据集;接着采用映射值匹配机制对降维后的数据集进行重复数据清洗处理,被清洗的值用0替代;然后将处理好的数据投入到K-Means++聚类算法中进行聚类分析;最终构建出TS-SAE-K-Means++多维数据聚类模型,并通过最优化分析得出其最优化参数设置情况。通过对不同基线组合算法的仿真对比分析表明,TS-SAE-K-Means++在聚类轮廓系数S与模型特征值F1评价体系中均优于其它算法组合。这表明提出的算法在解决高维数据内有效信息提取的问题上具有一定的优越性。With the continuous development of science and technology information,the volume and type of data are increasing day by day.To address the problem of high dimensionality of data sets and complicated extraction of effective information due to many duplicate data,this paper proposes a multi-dimensional data clustering algorithm based on improved sparse self-encoder.The algorithm is divided into two major parts:data processing and clustering analysis.The data processing first uses the layer-by-layer greedy principle in S-SAE to downscale the high-dimensional data set to a 6-dimensional data set in each group;Then the mapped value matching mechanism is used to clean the downscaled data set with duplicate data,and the cleaned values are replaced by O;Then the processed data are put into the K-Means++clustering algorithm for clustering analysis;Finally,a TS-SAE-K-Means++multi-dimensional data clustering model is constructed and its optimal parameter settings are derived by optimization analysis.The simulation comparison analysis of different baseline combination algorithms shows that TS-SAE-K-Means++outperforms other algorithm combinations in the evaluation system of clustering profile coefficient S and model eigenvalue F1.This indicates that the algorithm proposed in this paper has certain superiority in solving the problem of effective information extraction within high-dimensional data.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.116.67.226