基于广义分布的区间函数型聚类方法  

Interval Function Type Clustering Method Under Generalized Distribution

在线阅读下载全文

作  者:孙利荣[1,2] 蒋晨锴 田颖华 郭宝才[1,2] SUN Lirong;JIANG Chenkai;TIAN Yinghua;GUO Baocai(School of Statistics and Mathematics,Zhejiang Gongshang University,Hangzhou 310018;Collaborative Innovation Center of Statistical Data Engineering Technology&Application,Zhejiang Gongshang University,Hangzhou 310018)

机构地区:[1]浙江工商大学统计与数学学院,杭州310018 [2]浙江工商大学统计数据工程技术与应用协同创新中心,杭州310018

出  处:《系统科学与数学》2024年第8期2496-2514,共19页Journal of Systems Science and Mathematical Sciences

基  金:国家社会科学基金重点项目(23ATJ009)资助课题。

摘  要:区间函数型聚类是一种用来分析连续型高频数据的方法.已有均匀分布下的区间函数型聚类方法,不能充分利用区间内部的分布信息.而且均匀分布的假设,不符合很多数据的实际分布情况,造成聚类效果和稳定性较差.针对这些问题,文章考虑数据分布的实际情况,使用原始数据的均值和标准差,改进已有的中点-半径法,提出了基于广义分布的区间函数型聚类方法.该方法扩大了区间函数型聚类的使用范围,不仅可以更好地描述区间内部的分布情况,而且能够充分地利用和获取数据信息的内在特征,提高聚类结果的有效性和合理性.使用蒙特卡罗方法,计算聚类效果的内部指标,比较文章所提方法与已有均匀分布下的区间函数型聚类方法的优劣,结果显示文章提出的方法优于已有均匀分布下的区间函数型聚类方法.最后将文章所提方法应用到不同城市的大气污染物浓度的聚类分析中,验证该方法不仅可以有效地解决实际问题,且与已有方法相比具有明显优势.Interval function clustering is a method used to analyze continuous highfrequency data.The existing interval function based clustering under uniform distribution cannot fully utilize the distribution information within the interval.Moreover,the assumption of uniform distribution does not conform to the distribution of many data,resulting in poor clustering performance and stability.In response to these issues,this article considers the actual situation of data distribution.Using the mean and standard deviation of the original data,we improve the existing midpoint-radius method and propose an interval function based clustering method based on generalized distribution.This method expands the range of use of interval functional clustering and better describes the distribution within the interval.And it can fully utilize and obtain the inherent features of data information,improve the effectiveness and rationality of clustering results.Using the Monte Carlo method,we calculate the internal indicator and compare the advantages and disadvantages of the proposed method with existing interval function clustering under the assumption of uniform distribution.The results show that the proposed method in this article is superior to existing interval function clustering methods under uniform distribution.Finally,the proposed method in this article is applied to cluster analysis of atmospheric pollutant concentrations in different cities.It has been verified that this method not only effectively solves practical problems,but also has obvious advantages compared to existing methods.

关 键 词:区间函数型数据 均值-标准差距离 广义分布 聚类分析 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象