基于主成分分析和深度自编码高斯混合模型的无监督异常数据检测方法研究  被引量:3

Research on unsupervised anomaly data detection method based on PCA and DAGMM

在线阅读下载全文

作  者:刘翔宇 朱诗兵 杨帆 LIU Xiangyu;ZHU Shibing;YANG Fan(Space Engineering University,Beijing 101416,China)

机构地区:[1]航天工程大学,北京101416

出  处:《现代电子技术》2023年第3期75-80,共6页Modern Electronics Technique

基  金:部委重点项目(1900)。

摘  要:在异常数据检测中,由于数据量过大和数据特征维度过高,往往会导致数据标定困难、数据冗余、算法效率降低等。针对以上问题,将主成分分析(PCA)特征选择算法与深度自编码高斯混合模型(DAGMM)相结合,提出一种新的无监督异常数据检测方法PCA-DAGMM。该方法首先利用PCA特征选择算法对数据进行预处理,去除对分类效果增益较小的冗余数据,降低运算成本;然后将特征选择后的数据输入到DAGMM模型中进行训练。基于kddcup99数据集和CIC-IDS-2017数据集进行实验,并与多种特征选择算法进行对比,实验结果表明,PCA-DAGMM方法可以有效优化分类器性能,提高分类器训练效率,适用于解决网络流量异常检测问题,F1指数在kddcup99数据集和CIC-IDS-2017数据集上比DAGMM模型分别提高了4.37%和1.06%,训练时间减少了14.43%和8%。In anomaly data detection,a large amount of data and high dimensionality of data features often lead to difficulties in data calibration,data redundancy and reduced algorithm efficiency.Therefore,a new unsupervised anomaly data detection method PCA-DAGMM is proposed by combining principal component analysis(PCA)feature selection algorithm and deep autoencoding Gaussian mixture model(DAGMM).In this algorithm,the PCA feature selection algorithm is used to preprocess the data,remove the redundant data which has less gain for classification effect and reduce the computing cost.And then,the selected data is input to the DAGMM model for training.Experiments were carried out based on the dataset kddcup99and dataset CIC-IDS-2017 and the results are contrasted with several feature selection algorithms.The experimental results show that the PCA-DAGMM method can effectively optimize the classifier performance and improve the classifier training efficiency,so it is suitable for the network traffic anomaly detection.In comparison with that of DAGMM model,its F1 index is improved by4.37%and 1.06%on the dataset kddcup99 and dataset CIC-IDS-2017,and its training time is reduced by 14.43%and 8%,respectively.

关 键 词:无监督异常数据检测 主成分分析 特征选择 深度自编码高斯混合模型 密度估计 联合训练 

分 类 号:TN919-34[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象