高维大数据分析的无监督异常检测方法被引量：11

Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis

作　　者：邹承明陈德[2] ZOU Cheng-ming;CHEN De(Hubei Key Laboratory of Transportation Internet of Things Technology,Wuhan 430070,China;School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China;Peng Cheng Laboratory,Shenzhen,Guangdong 518000,China)

机构地区：[1]交通物联网技术湖北省重点实验室,武汉430070 [2]武汉理工大学计算机科学与技术学院,武汉430070 [3]鹏城实验室,广东深圳518000

出　　处：《计算机科学》2021年第2期121-127,共7页Computer Science

基　　金：国家重点研发计划(2018YFC0704300)。

摘　　要：高维数据的无监督异常检测是机器学习的重要挑战之一。虽然先前基于单一深度自动编码器和密度估计的方法已经取得了显著的进展,但是其仅通过一个深度自编码器来生成低维表示,这表明没有足够的信息来执行后续的密度估计任务。为了解决上述问题,文中提出了一种混合自动编码器高斯混合模型(Mixed Auto-encoding Gaussian Mixture Model,MAGMM)。MAGMM使用混合自动编码器来代替单一深度自动编码器生成串联的低维表示,因此它可以保存来自输入样本的特定集群的关键信息。此外,其利用分配网络来约束混合自动编码器,这样每个样本都可以分配给一个占主导地位的自动编码器。利用上述机制,MAGMM避免了陷入局部最优,降低了重构误差,从而可以促进密度估计任务的完成,提高高维数据异常检测的准确性。实验结果表明,该方法优于DAGMM,并在标准F1分数上提高了29%。Unsupervised anomaly detection on high-dimensional data is one of the most significant challenges in machine learning.Although previous approaches based on single deep auto-encoder and density estimations have made significant progress,they generate low-dimensional representations as they use only a single deep auto-encoder,indicating that there is insufficient information to perform the subsequent density estimation task.To address the above challenge,a mixed auto-encoding gaussian mixture model(MAGMM)is proposed in this paper.MAGMM substitutes a single deep auto-encoder with a mixture of auto-encoders to generate concatenated low-dimensional representations,so that it can preserve key information from a specific cluster of the input sample.In addition,it utilizes an allocation network to constrain the mixture of auto-encoders,so that each sample can be assigned to a dominant auto-encoder.With the above mechanisms,MAGMM avoids from trapping into local optima and reduces the reconstruction errors,which can facilitate completing the density estimation tasks and improve the accuracy of high-dimensional data anomaly detection.Experimental results show that the proposed method performs better than DAGMM and achieves up to 29% improvement based on the standard F1 score.

关键词：数据挖掘无监督异常检测降维高斯混合模型密度估计

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高维大数据分析的无监督异常检测方法被引量：11

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高维大数据分析的无监督异常检测方法 被引量：11

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

高维大数据分析的无监督异常检测方法被引量：11