面向高维流数据的离群值检测算法  

Outlier detection algorithm for high dimensional data stream

在线阅读下载全文

作  者:梁昌好 童英华[1,2] 冯忠岭[3] LIANG Chang-hao;TONG Ying-hua;FENG Zhong-ling(School of Computer,Qinghai Normal University,Xining 810008,China;The State Key Laboratory of Tibetan Intelligent Information Processing and Application,Qinghai Normal University,Xining 810008,China;School of Physics and Electronic Information,Qinghai Normal University,Xining 810008,China)

机构地区:[1]青海师范大学计算机学院,青海西宁810008 [2]青海师范大学省部共建藏语智能信息处理及应用国家重点实验室,青海西宁810008 [3]青海师范大学物理与电子信息学院,青海西宁810008

出  处:《计算机工程与设计》2024年第5期1406-1412,共7页Computer Engineering and Design

基  金:国家自然科学基金项目(61862055);河北省物联网监控工程技术研究中心基金项目(3142016020);青海省物联网重点实验室基金项目(2020-ZJ-Y16)。

摘  要:累计局部离群因子(cumulative local outlier factor,C_LOF)算法能有效解决数据流中的概念漂移问题和克服离群点检测中的伪装问题,但在处理高维数据时,时间复杂度较高。为有效解决时间复杂度高的问题,提出一种基于投影索引近邻的累计局部离群因子(cumulative local outlier factor based projection indexed nearest neighbor,PINN_C_LOF)算法。使用滑动窗口维护活跃数据点,在新数据到达和旧数据过期时,引入投影索引近邻(projection indexed nearest neighbor,PINN)方法,增量更新窗口中受影响数据点的近邻。实验结果表明,PINN_C_LOF算法在检测高维流数据离群值时,在保持检测精确度的前提下,其时间复杂度较C_LOF算法明显降低。Cumulative local outlier factor(C_LOF)algorithm can effectively solve the concept drift problem in data stream and overcome the camouflage problem in outlier detection,but it has high time complexity in processing high-dimensional data.To effectively solve the problem of high time complexity,a cumulative local outlier factor based projection indexed nearest neighbor(PINN_C_LOF)algorithm was proposed.A sliding window was used to maintain active data points,and a projection indexed nearest neighbor(PINN)method was introduced to incrementally update the neighbors of affected data points in the window when new data point arrived and old data point expired.Experimental results show that the time complexity of PINN_C_LOF algorithm is significantly lower than that of C_LOF algorithm on the premise of maintaining the detection accuracy.

关 键 词:高维流数据 离群值检测 累计局部离群因子 时间复杂度 投影索引近邻 局部离群因子 物联网 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象