基于邻域一致性的异常检测序列集成方法  被引量:4

Locality and Consistency Based Sequential Ensemble Method for Outlier Detection

在线阅读下载全文

作  者:刘意 毛莺池[1,2] 程杨堃 高建 王龙宝 LIU Yi;MAO Ying-chi;CHENG Yang-kun;GAO Jian;WANG Long-bao(College of Computer and Information,Hohai University,Nanjing 211100,China;Key Laboratory of Water Big Data Technology of Ministry of Water Resources,Nanjing 211100,China)

机构地区:[1]河海大学计算机与信息学院,南京211100 [2]水利部水利大数据重点实验室,南京211100

出  处:《计算机科学》2022年第1期146-152,共7页Computer Science

基  金:国家重点研发课题(2018YFC0407105);国家自然科学基金重点项目(61832005);华能集团重点研发课题(HNKJ17-21)。

摘  要:异常检测已广泛应用于多个应用领域,如网络入侵检测、信用卡欺诈检测等。数据维度的增加导致出现许多不相关和冗余的特征,这些特征会掩盖相关特征,出现假阳性结果。由于高维数据具有稀疏性和距离聚集效应,传统的基于密度、距离等的异常检测算法不再适用。大部分基于机器学习的异常检测研究都关注单一模型,而单一模型在抗过拟合能力上存在一定的不足。集成学习模型有着良好的泛化能力,而且在实际应用中展现出比单一模型更好的预测准确性。文中提出了基于邻域一致性的异常检测序列集成方法(Locality and Consistency Based Sequential Ensemble Method for Outlier Detection,LCSE)。首先基于多样性构造异常检测基本模型,其次根据全局集成一致性筛选出异常候选点,最后考虑数据局部邻域相关性选择并组合基本模型结果。通过实验验证,LCSE相比传统方法异常检测的准确率平均提升了20.7%,与集成算法LSCP;OM和iForest相比,性能(AUC)平均提升了3.6%,因此其性能优于其他集成方法和神经网络方法。Outlier detection has been widely used in many fields,such as network intrusion detection,credit card fraud detection,etc.The increase in data dimensions leads to many irrelevant and redundant features,which will obscure the relevant features and result in false positive results.Due to the sparseness and distance aggregation effects of high-dimensional data,the traditional outlier detection algorithms based on density and distance are no longer applicable.Most of the outlier detection research based on machine learning focuses on a single model,which has certain deficiencies in anti-overfitting ability.The ensemble learning model has good generalization ability,and in actual application shows better prediction accuracy than the single model.This paper proposes an outlier detection sequence integration method LCSE based on neighborhood consistency(locality and consistency based sequential ensemble method for outlier detection).Firstly,it constructs a basic model of outlier detection based on diversity,secondly,selects the abnormal candidate points according to the global integration consistency,and finally considers the local neighborhood correlation of the data to select and combine the basic model results.Experiments verify that LCSE has an average outlier detection accuracy increase of 20.7%compared with traditional methods.Compared with the ensemble methods LSCP_AOM and iForest,the performance is increased by 3.6%on average.Therefore,it is better than other ensemble methods and neural network methods.

关 键 词:高维数据 异常检测 集成多样性 集成一致性 领域相关性 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象