基于反向标签传播的多生成器主动学习算法及其在离群点检测中的应用研究  

Multi-generator Active Learning Algorithm Based on Reverse Label Propagation and Its Application in Outlier Detection

在线阅读下载全文

作  者:邢开颜 陈文 XING Kaiyan;CHEN Wen(School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China)

机构地区:[1]四川大学网络空间安全学院,成都610065

出  处:《计算机科学》2024年第4期359-365,共7页Computer Science

基  金:国家重点研发计划(020YFB1805405,2019QY0800);国家自然科学基金(U19A2068,61872255)。

摘  要:当前正负类训练样本分布不均衡的问题已极大地限制了离群检测模型的性能。基于主动学习的离群点检测算法能够通过对样本分布的主动学习,自动合成离群点以平衡训练数据分布。然而,传统的基于主动学习的检测方法缺乏对合成离群点的质量评估和过滤筛选,导致通过主动学习过程合成的训练样本点中存在样本噪声,并降低了分类模型的性能。针对上述问题,提出了基于反向标签传播的多生成器主动学习算法(Multi-Generator Active Learning Algorithm Based on Reverse Label Propagation,MG-RLP),其包括多个神经网络生成器和一个用于离群点边界检测的鉴别器。MG-RLP通过多个子生成器生成多分布特征的样本数据,以防止单生成器合成的训练样本过于聚集而导致的模式崩塌问题。同时,MG-RLP利用反向标签传播过程对神经网络生成的样本点进行质量评估,以筛选出可信的合成样本。筛选后的样本被保留在训练样本中用于对鉴别器进行迭代训练,以提升对离群点的检测性能。基于5个公共数据集,对比验证了MG-RLP与6种典型的离群点检测算法的性能,结果表明,MG-RLP在AUC和检测精度指标上分别提高了15%和22%,结果验证了MG-RLP的有效性。The current problem of unbalanced distribution of positive and negative training samples has greatly limited the performance of outlier detection models.The outlier detection algorithm based on active learning can automatically synthesize outliers to balance the training data through active learning of sample distribution.However,the traditional detection method based on active learning lacks the quality assessment and filtering of synthetic outliers,which leads to the fact that the noise in the synthetic training samples degrades the performance of classification models.Aiming at the above problems,a multi-generator adversarial learning algorithm based on reverse label propagation(MG-RLP)is proposed,which consists of multiple neural network generators and a discriminator for outlier boundary detection.MG-RLP uses multiple sub-generators to generate sample data with multi-distribution features to prevent the mode collapse problem caused by the excessive aggregation of training samples synthesized by a single generator.At the same time,the proposed method utilizes the reverse label propagation to evaluate the quality of the sample points generated to screen out credible synthetic samples.The filtered samples are retained in the training samples to iteratively train the discriminator to improve the detection performance of outliers.The MG-RLP is compared with six typical outlier detection algorithms on five public datasets.The results show that the proposed algorithm improves AUC and detection precision by 15%and 22%respectively,which verifies its effectiveness.

关 键 词:离群点检测 主动学习 生成对抗网络 标签传播 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象