检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙利荣[1,2] 蒋晨锴 田颖华 郭宝才[1,2] SUN Lirong;JIANG Chenkai;TIAN Yinghua;GUO Baocai(School of Statistics and Mathematics,Zhejiang Gongshang University,Hangzhou 310018;Collaborative Innovation Center of Statistical Data Engineering Technology&Application,Zhejiang Gongshang University,Hangzhou 310018)
机构地区:[1]浙江工商大学统计与数学学院,杭州310018 [2]浙江工商大学统计数据工程技术与应用协同创新中心,杭州310018
出 处:《系统科学与数学》2024年第8期2496-2514,共19页Journal of Systems Science and Mathematical Sciences
基 金:国家社会科学基金重点项目(23ATJ009)资助课题。
摘 要:区间函数型聚类是一种用来分析连续型高频数据的方法.已有均匀分布下的区间函数型聚类方法,不能充分利用区间内部的分布信息.而且均匀分布的假设,不符合很多数据的实际分布情况,造成聚类效果和稳定性较差.针对这些问题,文章考虑数据分布的实际情况,使用原始数据的均值和标准差,改进已有的中点-半径法,提出了基于广义分布的区间函数型聚类方法.该方法扩大了区间函数型聚类的使用范围,不仅可以更好地描述区间内部的分布情况,而且能够充分地利用和获取数据信息的内在特征,提高聚类结果的有效性和合理性.使用蒙特卡罗方法,计算聚类效果的内部指标,比较文章所提方法与已有均匀分布下的区间函数型聚类方法的优劣,结果显示文章提出的方法优于已有均匀分布下的区间函数型聚类方法.最后将文章所提方法应用到不同城市的大气污染物浓度的聚类分析中,验证该方法不仅可以有效地解决实际问题,且与已有方法相比具有明显优势.Interval function clustering is a method used to analyze continuous highfrequency data.The existing interval function based clustering under uniform distribution cannot fully utilize the distribution information within the interval.Moreover,the assumption of uniform distribution does not conform to the distribution of many data,resulting in poor clustering performance and stability.In response to these issues,this article considers the actual situation of data distribution.Using the mean and standard deviation of the original data,we improve the existing midpoint-radius method and propose an interval function based clustering method based on generalized distribution.This method expands the range of use of interval functional clustering and better describes the distribution within the interval.And it can fully utilize and obtain the inherent features of data information,improve the effectiveness and rationality of clustering results.Using the Monte Carlo method,we calculate the internal indicator and compare the advantages and disadvantages of the proposed method with existing interval function clustering under the assumption of uniform distribution.The results show that the proposed method in this article is superior to existing interval function clustering methods under uniform distribution.Finally,the proposed method in this article is applied to cluster analysis of atmospheric pollutant concentrations in different cities.It has been verified that this method not only effectively solves practical problems,but also has obvious advantages compared to existing methods.
关 键 词:区间函数型数据 均值-标准差距离 广义分布 聚类分析
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7