一种基于改进KNN的大数据离群点检测算法  被引量:4

An Outlier Detection Algorithm in Big Data Based on Improved KNN

在线阅读下载全文

作  者:黄建理 杜金燃 谢家全 秦科 

机构地区:[1]南方电网科学研究院有限责任公司智能电网研究所,广东广州510080

出  处:《计算机与现代化》2017年第5期67-70,75,共5页Computer and Modernization

基  金:四川省科技厅科技支撑计划项目(2013GZ0141)

摘  要:针对KNN算法在大数据离群点检测领域中难以处理高维数据和时间复杂度过高的这2个缺点,提出一种基于AOR(属性重叠率)的分类方法,并对KNN算法进行改进。首先对数据进行基于AOR的降维处理,使得数据可处理维度大大增加,然后对传统的KNN算法进行剪枝改进,减少了大量的无效计算。实验结果表明,本文算法对维度高、容量大的大数据样本在运行效率、准确度等方面有较大的提升。Aiming at the two shortcomings of KNN algorithm in the field of large data outlier detection, high dimension data is difficult to deal with and time complexity is too high. A classification method based on AOR (Attribute Overlapping Rate) is proposed, and the KNN algorithm is improved. At first the data were reduced the dimension based on AOR, making data processing dimension great increase. Then the traditional KNN algorithm was improved by pruning, reducing lots of invalid computation. The experimental results show that this algorithm has a great improvement on the operational efficiency and accuracy of the large data samples with high dimension and large capacity.

关 键 词:大数据 KNN 降维 属性重叠率 剪枝 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象