检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广州市城市更新规划设计研究院有限公司,广州510000
出 处:《科技创新与应用》2024年第28期31-34,38,共5页Technology Innovation and Application
摘 要:属性数据分为数值型数据和分类型数据,一般情况下对于数值型数据运算前要进行标准化处理,但是对于数值型数据差异大的数据,由于大数掩盖小数的影响,按照K-prototypes聚类算法,数值型数据标准化后而且不对相应的分类数据有任何预处理或者在计算时没有进行任何改变,很可能提高分类数据在聚类中的影响,并且分类型数据并未进一步地细分,不能满足不同要求的混合属性聚类。该文在将数值型数据标准化的基础上,将分类数据细分为二元数据和类型数据,并用相异度系数距离计算分类数据之间的距离,并且赋予二元和类型数据相应的权重,来改进K-prototypes聚类算法,使该算法满足不同要求的混合属性数据聚类,最后通过C#语言,在ArcEngine2010版本上实现。Attribute data is divided into numerical data and classification data.Generally,numerical data needs to be standardized before operation.However,for data with large differences in numerical data,since the large number hides the influence of decimal numbers,according to the K-prototypes clustering algorithm,after the numerical data is standardized and the waterlogging classification data is not preprocessed or changed during calculation,the influence of classification data in clustering is likely to be improved.Moreover,the classified data has not been further subdivided and cannot meet the different requirements of mixed attribute clustering.Based on standardizing numerical data,this paper divides classified data into binary data and type data,uses dissimilarity coefficient distance to calculate the distance between classified data,and assigns waterlogging weights to the binary and type data to improve the K-prototypes clustering algorithm,so that the algorithm can meet different requirements for mixed attribute data clustering.Finally,it is implemented on ArcEngine2010 version through C#language.
关 键 词:K-prototypes算法 混合属性 类型数据 相异度系数 加权属性
分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49