基于数据划分的k-近邻分类加速算法机理分析被引量：1

Mechanism analysis of the accelerator for k-nearest neighbor algorithm based on data partition

作　　者：宋云胜[1] 王杰[1] 梁吉业[1,2] SONG Yunsheng;WANG Jie;LIANG Jiye(School of Computer ＆ Information Technology, Shanxi University, Taiyuan 030006;Key Laboratory of Computation Intelligence ＆ Chinese Information Processing （ Shanxi University）, Ministry of Education, Taiyuan 030006)

机构地区：[1]山西大学计算机与信息技术学院,山西太原030006 [2]计算智能与中文信息处理教育部重点实验室(山西大学),山西太原030006

出　　处：《中国科学技术大学学报》2018年第4期331-340,共10页JUSTC

基　　金：国家自然科学基金重点项目(U1435212;61432011);山西省重点科技攻关项目(MQ2014-09)资助

摘　　要：k-近邻(k NN)分类算法因具有不对数据分布做任何假设、操作简单且泛化性能较强的特点,在人脸识别、文本分类、情感分析等领域被广泛使用.k NN分类算法不需要训练过程,其简单存储训练实例并根据测试实例与存储的训练实例进行相似度比较来预测分类.由于k NN分类算法需要计算测试实例与所有训练实例之间的相似度,故难以高效地处理大规模数据.为此提出将寻找近邻的过程转化为一个优化问题,并给出了原始优化问题与使用数据划分优化问题的最优解下目标函数差异的估计.通过对此估计的理论分析表明,聚类划分可以有效的减小此差异,进而保证基于聚类的k-近邻分类(DC-k NN)算法具有较强的泛化性能.在公开数据集的实验结果显示,DC-k NN分类算法在很大程度上为测试实例提供了与原始k NN分类算法相同的k个近邻进而获得较高的分类精度.Due to its absence of hypotheses for the underlying distributions of data, simple execution and strong generation ability, k-nearest neighbor classification algorithm （kNN） is widely used in face recognition, text classification, emotional analysis and other fields, kNN does not need the training process, but it only stores the training instances until the unlabeled instance appears, and executes the predicted process. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances, hence it is difficult to deal with large-scale data. To overcome this difficulty, the process of computing the nearest neighbors is converted to a constrained optimization problem, and an estimation is given of difference on the value of the objective function under the optimal solution with and without data partition. The theoretical analysis of this estimation indicates that data partition using clustering can reduce this difference, and the k-nearest neighbor algorithm based on clustering can have a strong generation ability. Experiment results on public datasets show that the k-nearest neighbor algorithm based on clustering can largely obtain the same nearest neighbors of raw kNN, thus obtaining higher classification accuracy.

关键词：K-近邻数据划分局部信息实例子集聚类

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于数据划分的k-近邻分类加速算法机理分析被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于数据划分的k-近邻分类加速算法机理分析 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于数据划分的k-近邻分类加速算法机理分析被引量：1