面向类不平衡和重叠的工控数据异常检测的半监督欠采样方法  

Semi-supervised under-sampling method for anomaly detection of industrialcontrol data with class imbalance and overlap

在线阅读下载全文

作  者:顾兆军[1] 扬雪影 隋翯 张一诺 Gu Zhaojun;Yang Xueying;Sui He;Zhang Yinuo(Information Security Evaluation Center,Civil Aviation University of China,Tianjin 300300,China;College of Computer Science&Technology,Civil Aviation University of China,Tianjin 300300,China;College of Aeronautical Engineering,Civil Aviation University of China,Tianjin 300300,China)

机构地区:[1]中国民航大学信息安全中心,天津300300 [2]中国民航大学计算机科学与技术学院,天津300300 [3]中国民航大学航空工程学院,天津300300

出  处:《计算机应用研究》2025年第1期156-164,共9页Application Research of Computers

基  金:国家自然科学基金资助项目(U2333201)。

摘  要:工业控制系统异常检测面临着数据缺乏标签信息、类不平衡和类重叠的耦合问题,导致现有的分类器难以精准检测异常数据。现有的数据级采样方法在打伪标签、数据平衡或检测重叠区域时存在着打伪标签结果不准确、采样效果稳定性差以及重叠识别率低等问题。为此,提出一种基于半监督学习的欠采样方法(SSLU-LP)。该方法通过异构集成将标签传播机制和单类分类器结合,补充数据伪标签;利用最小生成树策略构建重叠区域检测模型;采用欠采样策略,通过最近邻搜索有选择性地去除部分多数类样本。最后该方法与四种经典分类器结合,在九个工控数据集上与九种混合算法进行比较。实验结果表明,所提方法可以精准地为无标签数据打伪标签,高效且有效检测出不平衡数据集中的重叠数据,改善了分类器的训练效果,提高了分类器的异常检测性能。Anomaly detection in industrial control systems faces challenges such as lack of label information,class imbalance,and class overlap,which hinder existing classifiers from accurately detecting anomalies.Current datalevel sampling methods suffer from inaccurate pseudo-labeling,poor sampling stability,and low overlap detection rates.Therefore,this paper proposed an undersampling method based on semi-supervised learning(SSLU-LP).This method combined the label propagation mechanism with a single class classifier through heterogeneous integration to supplement pseudo-labels.It constructed an overlap region detection model using the minimum spanning tree strategy and employed an undersampling strategy to selectively remove some majority class samples via nearest neighbor search.Finally,this paper combined the proposed method with 4 classical classifiers and compared it with 9 hybrid algorithms on 9 industrial control datasets.Experimental results show that the proposed method can accurately pseudo-label unlabeled data,efficiently and effectively detect overlapping data in unbalanced datasets,improve the classifier’s training performance,and enhance its anomaly detection capabilities.

关 键 词:工业控制系统 类不平衡 类重叠 半监督学习 异常检测 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象