一种适用于混合属性数据的K近邻方法被引量：2

A Novel K-Nearest Neighbor Method with an Application to Mixed-Attribute Data

作　　者：刘佳宇周凌云吴秋峰[4] 孟翔燕[4] 邓华玲[4] LIU Jia-yu;ZHOU Ling-yun;WU Qiu-feng;MENG Xiang-yan;DENG Hua-ling(College of Economics and Management,Northeast Agricultural University,Harbin 150030,China;College of Economics,Heilongjiang University of Finance and Economic,Harbin 150030,China;College of Engineering,Northeast Agricultural University,Harbin 150030,China;College of Science,Northeast Agricultural University,Harbin 150030,China)

机构地区：[1]东北农业大学经济管理学院,黑龙江哈尔滨150030 [2]黑龙江财经学院经济系,黑龙江哈尔滨150030 [3]东北农业大学工程学院,黑龙江哈尔滨150030 [4]东北农业大学理学院,黑龙江哈尔滨150030

出　　处：《数学的实践与认识》2020年第16期132-143,共12页Mathematics in Practice and Theory

基　　金：公益性行业(农业)科研专项项目二级任务(201503116-04-06);黑龙江省博士后基金(LBHZ15020);国家科技支撑计划专题任务(2014BAD12B01-1-3);哈尔滨市科技创新人才研究专项资金(青年后备人才)(2017RAQXJ096);半湿润区粳稻水分高效利用技术集成与示范(2018YFD0300105-2)。

摘　　要：对于传统K近邻算法只适用于数值属性数据类型的问题,提出了一种基于对混合属性数据中的不同属性列赋予不同权值的K近邻算法(K Nearest Neighbor for Mixed-attribute Data,KNNM),使新的K近邻算法能够适用于混合属性数据.由于混合数据间数值属性部分与分类属性部分对整体相似性度量的贡献率不同,又各分量对其所属的属性部分的相似性度量的贡献率不同的特点.提出了考虑数值属性部分与分类属性部分作为整体对混合属性数据间的相似性度量的贡献率,并考虑不同属性数据的各分量对其所属的数据间的相似性度量的贡献率的向量参数计算方法,以此提出了一种适用于混合属性数据的K近邻方法.在5个UCI数据集上的实验结果表明KNNM算法在准确率,宏平均召回率,宏平均精度、宏平均值和ROC均优于传统K近邻算法,以此说明KNNM方法在混合属性数据上的适用性与有效性.According to the problem of traditional k-Nearest Neighbor(KNN) algorithm that it’s only applicable to numerical data,this paper proposes a novel KNN algorithm based on assign different weights to different attribute columns between mixed attribute data(K Nearest Neighbor for Mixed-attribute Data,KNNM),which is suitable for mixed attribute data.As part of numerical data and part of category data in mixed attribute data make different contributions to the whole similarity measure,and the contribution of each component to the similarity measure of the attribute part to which it belongs is different.This paper proposes a computing vectors-based parameters method,which considers two contributions of part of numerical data and part of category data in mixed attribute data as a whole respectively to the whole similarity measure,and consider the contribution of each component to the data to which it belongs.Based this view,this paper presents the vector-based KNNM,which is suitable for mixed attribute data.The experimental results on five UCI datasets show that KNNM is superior to KNN in views of accuracy,macro average recall,macro average precision,macro average F1 measure and ROC,that is,KNNM algorithm is suitable and effective for mixed attribute data.

关键词：混合属性数据相似性度量 K近邻参数计算方法主成分分析法

分类号：TP311.13[自动化与计算机技术—计算机软件与理论] TP18[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种适用于混合属性数据的K近邻方法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种适用于混合属性数据的K近邻方法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种适用于混合属性数据的K近邻方法被引量：2