离群点检测算法综述  

Review of Outlier Detection Algorithms

在线阅读下载全文

作  者:孔翎超 刘国柱[1] KONG Lingchao;LIU Guozhu(School of Information Science and Technology,Qingdao,Shandong 266061,China)

机构地区:[1]青岛科技大学信息科学技术学院,山东青岛266061

出  处:《计算机科学》2024年第8期20-33,共14页Computer Science

基  金:国家自然科学基金(61973180)。

摘  要:离群点检测作为数据挖掘领域的一个重要研究方向,其目的是发掘隐藏在数据集合中与众不同且具有潜在分析价值的数据,辅助研究人员甄别数据源可能存在的问题。目前,离群点检测已被广泛应用于欺诈识别、智慧医疗、入侵检测、故障诊断等诸多领域。文中在总结前人经验的基础上,首先讨论离群点的定义、产生原因以及典型应用领域,综述了DBSCAN和LOF等离群点检测经典算法及其改进算法的优势和局限,分析了深度学习方法在离群点检测领域的优势;其次结合当前互联网背景下海量、高维、时序数据处理需求,对离群点检测算法在新环境下的发展状况做进一步研究;最后介绍离群点检测算法的评价指标、代价因子在离群点检测评价中的作用以及常用工具包和数据集,总结展望了离群点检测面临的挑战和未来的发展方向。Outlier detection,as an important research direction in the field of data mining,aims to discover data points in a dataset that are different from the majority and have potential analytical value,assistresearchers in identifying potential issues in the data source.Currently,outlier detection has been widely applied in various domains such as fraud detection,smart healthcare,intrusion detection,and fault diagnosis.This study,based on summarizing previous experiences,first discusses the definition of outliers,their causes,and typical application domains.It reviews the advantages and limitations of classical outlier detection algorithms such as DBSCAN and LOF,as well as their improved algorithms.Additionally,it analyzes the advantages of deep learning me-thods in the field of outlier detection.Secondly,considering the requirements for processing massive,high-dimensional,and temporal data in the current internet context,further research is conducted on the development status of outlier detection algorithms in new environments.Finally,the evaluation indicators of outlier detection algorithms,the role of cost factors in outlier detection evaluation,as well as commonly used toolkits and datasets,are introduced.The challenges and future development directions of outlier detection are summarized and prospected.

关 键 词:离群点 异常检测 深度学习 时序数据 数据挖掘 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象