一种融合α度量的混合数据K-prototypes算法  被引量:2

A Mixed Data K-prototypes Algorithm Fused Withα-Metrics

在线阅读下载全文

作  者:陈佳佳 张旺 刘东海 张晓琴 Chen Jiajia;Zhang Wang;Liu Donghai;Zhang Xiaoqin(School of Statistics,Shanxi University of Finance and Economics,Taiyuan 030006,China)

机构地区:[1]山西财经大学统计学院,太原030006

出  处:《统计与决策》2023年第10期16-22,共7页Statistics & Decision

基  金:山西省基础研究计划项目(202103021223304);山西省高等学校教学改革创新项目(J20220570)。

摘  要:在大数据背景下,分类型数据与混合型数据开始大量出现,如何更好地计算这类数据的相异性度量成为研究焦点。相比特定属性代表特定类的表达形式,模糊类中心表达形式因为含有更多信息、可计算欧氏距离、能更完善地展示不同样本之间的差异性等优点而得到推广使用。模糊类中心是定和为1的频率向量,这同时也符合成分数据的定义,因此,文章引入成分数据处理方式,提出一种融合α度量的改进K-prototypes算法(α-K-prototypes)。针对α度量的特殊性设定了权重调整系数,让分类型数据距离更具有解释性。在实验对比后发现,α-K-prototypes算法在UCI的7个数据集上均优于K-prototypes、K-centers、Improved-K-prototypes算法。为了更好地在实际中应用,文章给出了一种较优α计算准则,并证明其在统计意义上是显著的。In the context of big data,classified data and mixed data began to appear in large numbers.How to better calculate the dissimilarity measurement of such data has become the research focus.Compared with the expression that specific attributes represent specific classes,the fuzzy class center expression has been popularized because it contains more information,and can both calculate Euclidean distance and better show the differences between different samples.The center of the fuzzy class is the frequency vector fixed to 1,which also conforms to the definition of compositional data.Therefore,this paper introduces the compositional data processing method,and proposes an improved K-prototypes algorithm integrated withα-metrics(α-K-prototypes).According to the particularity ofα-metrics,the weight adjustment coefficient is set to make the distance of classified data more explanatory.After the experimental comparison,it is found thatα-K-prototypes algorithm is better than K-prototypes,K-centers and Improved-K-prototypes algorithms on the seven data sets of UCI.For better application in practice,this paper presents an optimalα-calculation criterion,and proves that it is statistically significant.

关 键 词:聚类分析 成分数据 混合数据 模糊类中心 

分 类 号:O212.1[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象