基于图神经网络的多智能体强化学习对抗策略检测算法  

Graph Neural Network-based Multi-agent Reinforcement Learning for Adversarial Policy Detection Algorithm

在线阅读下载全文

作  者:孙启宁 桂智明[1] 刘艳芳[2] 范鑫鑫 路云峰 SUN Qining;GUI Zhiming;LIU Yanfang;FAN Xinxin;LU Yunfeng(School of Computer Science,Beijing University of Technology,Beijing 100124,China;School of Computer Science and Engineering,Beihang University,Beijing 100083,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China;School of Reliability and Systems Engineering,Beihang University,Beijing 100088,China)

机构地区:[1]北京工业大学计算机学院,北京100124 [2]北京航空航天大学计算机学院,北京100083 [3]中国科学院计算技术研究所,北京100190 [4]北京航空航天大学可靠性与系统工程学院,北京100088

出  处:《计算机与现代化》2025年第4期42-49,共8页Computer and Modernization

基  金:复杂关键软件全国重点实验室自主课题(SKLSDE-2023ZX-17)。

摘  要:在多智能体环境中,强化学习模型在应对对抗攻击方面存在安全漏洞,容易遭受对抗攻击。其中基于对抗策略的对抗攻击由于不直接修改受害者的观测,对其进行防御的难度更大。为解决这一问题,本文提出一种基于图神经网络的对抗策略检测算法,旨在有效识别智能体间的恶意行为。通过在智能体协作过程中采用替代对抗策略训练图神经网络作为对抗策略检测器,根据智能体局部观测计算其他智能体的信任分数来检测对抗策略。本文的检测方法提供2种粒度的检测:对局级别的对抗检测以非常高的精度检测对抗策略;时间步级别的对抗检测可以在对局初期进行对抗检测,及时发现对抗攻击。在星际争霸平台上进行一系列实验,实验结果表明,本文所提出的检测方法在检测最先进的基于对抗策略的对抗攻击时最高可以达到1.0的AUC值,优于最先进的检测方法。本文检测方法比现有的方法能够更快地检测出对抗策略,最快可以在第5个时间步检测出对抗攻击。将本文检测方法应用于对抗防御,使受攻击对局提升最高61个百分点的胜率。此外实验结果显示了本文的算法具有很强的泛化性,本文的检测方法无需再次训练,可以直接用来检测基于观测的对抗攻击。因此,本文提出的方法为多智能体环境中的强化学习模型提供了一种有效的对抗攻击检测机制。In a multi-agent environment,the reinforcement learning model has security vulnerabilities in coping with adversarial attacks and is susceptible to adversarial attacks,of which adversarial policy-based adversarial attacks are more difficult to defend against because they do not directly modify the victim’s observations.To solve this problem,this paper proposes a graph neural network-based adversarial policy detection algorithm,which aims to effectively identify malicious behaviors among agents.This paper detects adversarial policy by training the graph neural network as an adversarial policy detector by employing alternative adversarial policies during the collaboration process of the agents,and calculates the trust scores of the other agent based on the local observations of the agents.The detection method in this paper provides two levels of granularity;adversarial detection at the game level detects adversarial policies with very high accuracy,and time-step level adversarial detection allows for adversarial detection at the early stage of the game and timely detection of adversarial attacks.This paper conducts a series of experiments on the StarCraft platform,whose results show that the detection method proposed in this paper can achieve an AUC value of up to 1.0 in detecting the most advanced adversarial policy-based adversarial attacks,which is better than the state-ofthe-art detection methods.The detection method in this paper can detect adversarial policy faster than existing methods,and can detect the adversarial attack at the 5th time step at the earliest.Applying this paper’s detection method to adversarial defense improves the win rate of the attacked game by up to 61 percentage points.In addition experimental results show that the algorithm in this paper is highly generalizable and the detection method in this paper does not need to be trained again and can be directly used to detect observation-based adversarial attacks.Therefore,the method proposed in this paper provides an effective adversarial

关 键 词:强化学习 多智能体系统 对抗攻击 对抗检测 图神经网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象