基于邻域平均距离的离群点检测算法  被引量:1

Outlier Detection Algorithm Based on Neighborhood Average Distance

在线阅读下载全文

作  者:史金余[1] 杜晓涵 孙禹明 李春慧 SHI Jinyu;DU Xiaohan;SUN Yuming;LI Chunhui(School of Information Science and Technology,Dalian Maritime University,Dalian 116026)

机构地区:[1]大连海事大学信息科学技术学院,大连116026

出  处:《计算机与数字工程》2024年第7期1916-1920,共5页Computer & Digital Engineering

基  金:国家自然科学基金委青年基金项目(编号:62103072);中国博士后科学基金资助项目(编号:2021M690502);中央高校基本科研基金(编号:3132021242)资助。

摘  要:离群点检测是数据挖掘领域的一个热点问题,离群点检测可以有效地识别出数据集中的离群点,为数据分析提供方便。为提高数据分析精度,有效筛选离群点,提出一种基于邻域平均距离的离群点检测算法。首先计算误差平方和并使用肘部法确定最佳聚类个数K,然后将K代入K-Means的优化算法二分K-Means中对数据集进行聚类处理,从而得到K个数据簇,最后分别计算每个簇中质心ε邻域的邻域平均距离,将与质心距离大于阈值距离的样本点作为离群点集。实验结果表明,在标准数据集UCI上,该算法的检测率有较好的表现。Outlier detection is a hot issue in the field of data mining.Outlier detection can effectively identify outliers in data sets and provide convenience for data analysis.In order to improve the accuracy of data analysis and effectively screen outliers,this paper proposes an outlier detection algorithm based on neighborhood average distance.Firstly,the sum of squares of errors is calcu⁃lated and the optimal number of clustering K is determined by using the elbow method.Then K is substituted into the binary K-Means optimization algorithm of K-Means to carry out clustering processing on the data set,so as to obtain K data clusters.Final⁃ly,the average neighborhood distance of theεneighborhood of the centroid in each cluster is calculated respectively.The sample points whose distance from the centroid is greater than the threshold distance are taken as the outlier set.Experimental results show that the algorithm performs well on standard UCI data set.

关 键 词:离群点检测 二分K-Means 肘部法 平均邻域距离 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象