云计算环境中面向大数据的改进密度峰值聚类算法  被引量:5

Improved Density Peaks Clustering Algorithm for Big Data in Cloud Computing Environment

在线阅读下载全文

作  者:郑冬花 叶丽珠 隋栋 黄锦涛 ZHENG Donghua;YE Lizhu;SUI Dong;HUANG Jintao(School of Information Technology and Engineering,Guangzhou College of Commerce,Guangzhou 511363,Guangdong,China;Graduate School,Management and Science University,Shah Alam 40100,Selangor,Malaysia;School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 102406,China;Faculty of Science and Technology,University of Macao,Macao 999078,China)

机构地区:[1]广州商学院信息技术与工程学院,中国广东广州511363 [2]管理与科学大学研究生院,马来西亚雪兰莪莎阿南40100 [3]北京建筑大学电气与信息工程学院,中国北京102406 [4]澳门大学科技学院,中国澳门999078

出  处:《济南大学学报(自然科学版)》2022年第5期592-596,602,共6页Journal of University of Jinan(Science and Technology)

基  金:国家自然科学基金项目(61702026);广东省高等教育学会项目(21GYB08);广州市哲学社会科学发展规划项目(2021GZGJ145);广东省高等学校特色专业建设项目(2020SJTSZY01);广东省普通高校特色创新类项目(2021KTSCX150);教育部高等教育司产学合作协同育人项目(202002030019)。

摘  要:对密度峰值聚类算法进行有效改进,计算各样本点之间的距离和各样本点局部密度,选择两者中较大的样本点作为聚类中心点,根据其余样本点与各中心点的距离设定样本点所属类别;引入K近邻算法对密度峰值聚类算法进行优化,求解各样本点的距离时只需要考虑其周围由邻近值决定的若干样本点,实现距离阈值的自动选取;根据距离矩阵计算样本点的密度,绘制决策图并选择簇内中心点,将剩余点根据密度值分配给离中心点距离最近的类;最后将K近邻-密度峰值聚类算法部署至Hadoop云计算平台,用于解决大规模数据聚类的问题。仿真结果表明,通过合理设置K近邻算法的近邻值k,K近邻-密度峰值聚类算法具有较好的大数据样本聚类性能,与常用聚类算法相比,该算法具有更高的聚类准确率和聚类效率,适用于大数据样本聚类。Density peaks clustering algorithm was effectively improved to calculate the distance between each sample point and the local density of each sample point. The larger sample point was selected as the cluster center point, and the category of each sample point was set according to the distance between the other sample points and each center point. K-nearest neighbor algorithm was introduced to optimize density peaks clustering algorithm. When solving the distance of each sample point, only a few surrounding sample points determined by using the neighboring value were considered, and the automatic selection of distance threshold was realized. According to the distance matrix, the density values of sample points were calculated, the decision diagram was drawn and the center point in the cluster was selected, and the remaining points were assigned to the class closest to the center point according to the density values. Finally, K-nearest neighbor-density peaks clustering algorithm was deployed to Hadoop cloud computing platform to solve the problem of large-scale data clustering. The simulation results show that K-nearest neighbor-density peaks clustering algorithm has better clustering performance for big data samples by reasonably setting the nearest neighbor value k of K-nearest neighbor algorithm. Compared with common clustering algorithms,this algorithm has higher clustering accuracy and efficiency,and is suitable for clustering big data samples.

关 键 词:大数据 云计算 密度峰值聚类 K近邻算法 决策图 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象