检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张文韬 张保勇[1] 袁德明[1] 徐胜元[1] ZHANG Wen-Tao;ZHANG Bao-Yong;YUAN De-Ming;XU Sheng-Yuan(School of Automation,Nanjing University of Science and Technology,Nanjing 210094)
出 处:《自动化学报》2025年第4期857-874,共18页Acta Automatica Sinica
基 金:国家自然科学基金(62273181,62373190,62221004)资助。
摘 要:本文研究了多智能体时变网络上基于Bandit反馈的分布式在线鞍点问题,其中每个智能体通过本地计算和局部信息交流去协作最小化全局损失函数.在Bandit反馈下,包括梯度在内的损失函数信息是不可用的,每个智能体仅能获得和使用在某决策或其附近产生的函数值.为此,结合单点梯度估计方法和预测映射技术,提出一种非欧几里得意义上的分布式在线Bandit鞍点优化算法.以动态鞍点遗憾作为性能指标,对于一般的凸−凹损失函数,建立了遗憾上界并在某些预设条件下确保所提算法的次线性收敛.此外,考虑到在迭代优化中计算优化子程序的精确解通常较为困难,进一步扩展一种基于近似计算方法的算法变种,并严格分析精确度设置对扩展算法遗憾上界的影响.最后,通过一个目标跟踪案例对算法的有效性和先进性进行仿真验证.This paper studies the distributed online saddle point problem with Bandit feedback over a multi-agent time-varying network,in which each agent collaborates to minimize the global loss function through local calculation and local information exchange.Under Bandit feedback,loss function information including gradients is not available,and each agent can only obtain and use the function value generated by a decision or decisions near it.To this end,a distributed online Bandit saddle point optimization algorithm in a non-Euclidean sense is proposed by combining the one-point gradient estimation method and the predictive mapping technique.Taking the expected dynamic saddle point regret as the performance metric,we establish the related regret upper bound for the general convex-concave loss functions and ensure that the proposed algorithm converges sublinearly under certain preconditions.In addition,considering that computing the exact solutions of the optimization oracles is usually difficult in iterative optimization,this paper further expands an algorithm variant based on an approximate computation method,and rigorously analyzes the impact of precision settings on the regret upper bound of the expanded algorithm.Finally,the effectiveness and advancement of the proposed algorithms are verified through a simulation example of target tracking.
关 键 词:BANDIT 反馈 分布式优化 在线鞍点问题 镜面下降 动态鞍点遗憾
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33