基于近邻传播的离群点检测算法  被引量:9

Outlier detection algorithm based on affinity propagation

在线阅读下载全文

作  者:张倩倩 于炯[1,2] 李梓杨 蒲勇霖 Zhang Qianqian;Yu Jiong;Li Ziyang;Pu Yonglin(School of Software,Xinjiang University,Urumqi 830091,China;College of Information Science&Engineering,Xinjiang University,Urumqi 830046,China)

机构地区:[1]新疆大学软件学院,乌鲁木齐830091 [2]新疆大学信息科学与工程学院,乌鲁木齐830046

出  处:《计算机应用研究》2021年第6期1662-1667,共6页Application Research of Computers

基  金:国家自然科学基金资助项目(61862060,61462079,61562086,61562078)。

摘  要:离群点是与其他正常点属性不同的一类对象,其检测技术在各行业上均有维护数据纯度、保障业内安全等重要应用,现有算法大多是基于距离、密度等传统方法判断检测离群点。本算法给每个对象分配一个“孤立度”,即该点相对其邻点的孤立程度,通过排序进行判定,比传统算法效率更高。在AP(affinity propagation)聚类算法的基础上进行改进与优化,提出能检测异常数据点的算法APO(outlier detection algorithm based on affinity propagation)。通过加入孤立度模块并计算处理样本点的孤立信息,并引入放大因子,使其与正常点之间的差异更明显,通过增大算法对离群点的敏感性,提高算法的准确性。分别在模拟数据集和真实数据集上进行对比实验,结果表明:该算法与AP算法相比,对离群点的敏感性更加强烈,且本算法检测离群点的同时也能聚类,是其他检测算法所不具备的。Outliers are a class of objects with different properties from other normal points,whose detection technology in various industries has a wide application to maintain the purity of data and ensure the safety of the industry.Most of the existing algorithms are based on distance,density,and other traditional methods to detect outliers.This paper assigned each object an“isolation degree”,the degree of isolation of the point relative to adjacent points,which could identify outliers by sorting,that was more efficient.It proposed the detection technology APO by improving and optimizing the AP clustering algorithm.It introduced the outlier module and processed the isolated information of points.In addition,it added the amplification factor to make the difference between the outliers and the normal points more obvious.By increasing the sensitivity of the algorithm to outliers,it improved the accuracy of the algorithm.The experiment used simulated dataset real datasets,who’s the results showed that the algorithm was more sensitive and it detected outliers more accurately than AP algorithm.In addition,this algorithm can cluster outliers while detecting outliers,which is not available in other detection algorithms.

关 键 词:离群点检测 聚类算法 数据挖掘 近邻传播 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象