基于K-prototypes的混合属性数据聚类算法改进  

Improved Clustering Algorithm for Mixed Attribute Data Based on K-Prototypes

在线阅读下载全文

作  者:倪丹 李泽文 NI Dan;LI Zewen

机构地区:[1]广州市城市更新规划设计研究院有限公司,广州510000

出  处:《科技创新与应用》2024年第28期31-34,38,共5页Technology Innovation and Application

摘  要:属性数据分为数值型数据和分类型数据,一般情况下对于数值型数据运算前要进行标准化处理,但是对于数值型数据差异大的数据,由于大数掩盖小数的影响,按照K-prototypes聚类算法,数值型数据标准化后而且不对相应的分类数据有任何预处理或者在计算时没有进行任何改变,很可能提高分类数据在聚类中的影响,并且分类型数据并未进一步地细分,不能满足不同要求的混合属性聚类。该文在将数值型数据标准化的基础上,将分类数据细分为二元数据和类型数据,并用相异度系数距离计算分类数据之间的距离,并且赋予二元和类型数据相应的权重,来改进K-prototypes聚类算法,使该算法满足不同要求的混合属性数据聚类,最后通过C#语言,在ArcEngine2010版本上实现。Attribute data is divided into numerical data and classification data.Generally,numerical data needs to be standardized before operation.However,for data with large differences in numerical data,since the large number hides the influence of decimal numbers,according to the K-prototypes clustering algorithm,after the numerical data is standardized and the waterlogging classification data is not preprocessed or changed during calculation,the influence of classification data in clustering is likely to be improved.Moreover,the classified data has not been further subdivided and cannot meet the different requirements of mixed attribute clustering.Based on standardizing numerical data,this paper divides classified data into binary data and type data,uses dissimilarity coefficient distance to calculate the distance between classified data,and assigns waterlogging weights to the binary and type data to improve the K-prototypes clustering algorithm,so that the algorithm can meet different requirements for mixed attribute data clustering.Finally,it is implemented on ArcEngine2010 version through C#language.

关 键 词:K-prototypes算法 混合属性 类型数据 相异度系数 加权属性 

分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象