检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄乐成 陈超 韩存鑫 赵彬 HUANG Lecheng;CHEN Chao;HAN Cunxin;ZHAO Bin(School of Computer Science and Engineering,Sichuan University of Light Chemical Technology,Zigong 643000,Sichuan,China)
机构地区:[1]四川轻化工大学计算机科学与工程学院,四川自贡643000
出 处:《实验室研究与探索》2022年第9期135-139,共5页Research and Exploration In Laboratory
摘 要:对中国2013~2018年高分辨率大气污染分析开放数据集采用传统数据挖掘方法时,面临数据量大、挖掘效率低等难题,改用基于Spark K-means的聚类方法对大气污染物海量信息进行研究。以6种常见大气污染物和5种环境影响因子为例,建立了Pm_(2.5)、Pm_(10)、SO_(2)、NO_(2)、CO、O_(3)和Temp等数据维度模型。对K-means算法选择初始聚类数K值时,利用Gap Statistic算法相比传统K-means算法利用SSE算法确定K值,Gap Statistic算法在高维度样本数据模型中确定K值更合理且直观。For the high-resolution air pollution reanalysis of air pollution in China in 2013 and 2018,using the traditional data mining method was faced on the problems of large data volume and low mining efficiency,hence,the clustering method based on K-means was used to study the massive information of air pollutants under Spark.Using six common atmospheric pollutants and five environmental impact factors as examples,the data-dimensional model of Pm_(2.5),Pm_(10),So_(2),No_(2),Co,O_(3),Temp et al.is presented.When selecting the initial cluster number K value of the K-means algorithm,the gap statistic algorithm achieves the value of the best cluster number K in the high-dimensional sample data model,which is more convincing than the traditional K-means to determine the K value using the SSE algorithm.It demonstrates that the K values determined using the Gap Statistic algorithm are more reasonable and intuitive than the SSE algorithm.
关 键 词:大气污染数据 聚类分析 Gap Statistic算法 误差分析
分 类 号:TP399[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49