改进的k最邻近算法在海量数据挖掘中的应用  被引量:13

Application of Improved k-Nearest Neighbor Algorithm in Massive Data Mining

在线阅读下载全文

作  者:黄文秀 唐超尘 神显豪[3] 周术诚[4] HUANG Wenxiu;TANG Chaochen;SHEN Xianhao;ZHOU Shucheng(School of Technology,Fuzhou Technology and Business College,Fuzhou 350715,Fujian,China;School of Telecommunications Engineering,Xidian University,Xi’an 710071,Shaanxi,China;Guangxi Key Laboratory of Embedded Technology and Intelligent System,Guilin University of Technology,Guilin 541004,Guangxi,China;College of Computer and Information Sciences,Fujian Agriculture and Forestry University,Fuzhou 350002,Fujian,China)

机构地区:[1]福州工商学院工学院,福建福州350715 [2]西安电子科技大学通信工程学院,陕西西安710071 [3]桂林理工大学广西嵌入式技术与智能系统重点实验室,广西桂林541004 [4]福建农林大学计算机与信息学院,福建福州350002

出  处:《济南大学学报(自然科学版)》2021年第1期24-28,共5页Journal of University of Jinan(Science and Technology)

基  金:国家自然科学基金项目(61741303);广西重点研发计划项目(2017AC05027);广西自然科学基金项目(2018GXNSFAA294061);广西嵌入式技术与智能系统重点实验室项目(2017-2-5);福建省中青年教师教育科研项目(JT180867);福建省本科高校教育教学改革研究项目(FBJG20190171)。

摘  要:为了提高数据挖掘的效率与准确性,将k最邻近算法与样本均衡策略相结合,在海量数据挖掘中进行应用;首先对样本集文本进行分析,找出样本领域的密集分布区域,对样本密集区域进行有效裁剪优化,实现样本分布均衡,然后对经过样本均衡处理的数据样本执行传统k最邻近算法,根据权重获得分类结果,最后对不同k值的k最邻近算法进行实例仿真。结果表明,在相同的数据样本环境中,相比于其他分类算法,采用改进的k最邻近算法的分类准确度和分类效率更高。To improve the efficiency and accuracy of data mining,the k-nearest neighbor algorithm was combined with the sample equalization strategy and applied to massive data mining.The sample set text was firstly analyzed to find the densely distributed area in the sample field.Sample dense areas are effectively cropped and optimized to achieve sample distribution equalization.The traditional k-nearest neighbor algorithm was then performed on the data samples that had undergone the sample equalization process.Classification results were obtained according to weights.Finally,k-nearest neighbor algorithms with different k values were simulated by using examples.The results show that in the same data sample environment,compared with other classification algorithms,the improved k-nearest neighbor algorithm has higher classification accuracy and classification efficiency.

关 键 词:数据挖掘 样本优化 k最邻近算法 样本均衡 邻域密集区域 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象