基于无监督对抗学习的时间序列异常检测  被引量:8

Time series anomaly detection based on unsupervised adversarial learning

在线阅读下载全文

作  者:邵世宽 张宏钧 肖钦锋 王晶[1,2,4] 刘晓辉 林友芳 Shao Shikuan;Zhang Hongjun;Xiao Qinfeng;Wang Jing;Liu Xiaohui;Lin Youfang(School of Computer and Information Technology,Beijing Jiaotong University,Beijing,100044,China;Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing,100044,China;TravelSky Technology Limited,Beijing,100010,China;CAAC Key Laboratory of Intelligent Passenger Service of Civil Aviation,Beijing,100010,China)

机构地区:[1]北京交通大学计算机与信息技术学院,北京100044 [2]交通数据分析与挖掘北京市重点实验室,北京交通大学,北京100044 [3]中国民航信息网络股份有限公司,北京100010 [4]民航旅客服务智能化应用技术重点实验室,北京100010

出  处:《南京大学学报(自然科学版)》2021年第6期1042-1052,共11页Journal of Nanjing University(Natural Science)

基  金:中国民航信息网络股份有限公司;民航旅客服务智能化应用技术重点实验室基金。

摘  要:时间序列异常检测是类别不均衡问题,异常现象少有发生,所以获取异常标签的成本高昂,因此基于无监督学习的时间序列异常检测方法更具有实用价值.然而,现有的时间序列异常检测方法存在三个缺陷:难以对复杂的时间序列进行建模、缺乏合理的缺失值处理机制和无法利用先验知识(例如少量的有标签异常).为了解决以上问题,提出一种基于生成对抗神经网络和自编码器的无监督时间序列异常检测模型SALAD(Stochastic Adversarial Learned Anomaly Detection).在原始空间结合生成对抗网络和自编码器网络并充分利用判别损失和绝对损失来完成数据重构;在隐空间中,为了使学习自编码器中的隐变量更紧凑地表示原始数据分布,引入生成对抗网络来约束隐变量的收敛,使其更接近先验分布;在训练过程中引入数据补全方法是一种更合理的缺失值处理机制;提出对比重构损失使SALAD能充分利用少量的有标签异常数据.在数据集上进行大量实验,结果表明,在完全无监督和使用部分异常标签的情形下,提出的模型的F1分数和现有的基线方法相比有明显的提升.Time series anomaly detection aims to detect anomalies from normal values giving observed time series. However,existing methods are difficult to model sophisticated time series since they are either non-stochastic or require explicit data posterior distribution. Also,those methods are in lack of mechanisms to handle missing values and to leverage prior knowledge(such as occasionally available labels). To address the above issues,this paper introduces SALAD(Stochastic Adversarial Learned Anomaly Detection),a novel unsupervised anomaly detection model based on generative adversarial networks for time series. In the original space,the generative adversarial network and the auto-encoder network are combined,and the discriminant loss and absolute loss are fully utilized to accomplish data reconstruction. In the hidden space,in order to learn a more compact stochastic representation of the distribution of the hidden variables in the auto-encoder to the original data,and the generative adversarial network is also introduced to constrain the convergence of the hidden variables so that they are closer to the prior distribution. The introduced data-imputation method introduced in the training process is a more reasonable mechanism for handling missing values. The proposed contrast reconstruction loss enables SALAD to make full use of the small amount of labeled anomalous data. Through extensive experimental results on the dataset of this paper,it is shown that the F1 scores of the model in this paper are significantly improved over the existing baseline methods in both the cases of completely unsupervised and using partial anomaly labels.

关 键 词:时间序列 异常检测 自编码器 生成对抗神经网络 无监督学习 

分 类 号:TP389.1[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象