具有反馈延迟分布式在线复合优化的动态遗憾性能  

Dynamic Regret for Distributed Online Composite Optimization With Delayed Feedbacks

在线阅读下载全文

作  者:侯瑞捷 李修贤 易新蕾 洪奕光 谢立华[4] HOU Rui-Jie;LI Xiu-Xian;YI Xin-Lei;HONG Yi-Guang;XIE Li-Hua(Department of Control Science and Engineering,College of Electronics and Information Engineering,Tongji University,Shanghai 201800,China;National Key Laboratory of Autonomous Intelligent Unmanned System,Frontiers Science Center for Intelligent Autonomous Systems,Ministry of Education,Shanghai Research Institute for Intelligent Autonomous Systems,and Shanghai Institute of Intelligent Science and Technology,Tongji University,Shanghai 201210,China;Laboratory for Information and Decision Systems,Massachusetts Institute of Technology,Cambridge 02139,USA;School of Electrical and Electronic Engineering,Nanyang Technological University,Singapore 639798,Singapore)

机构地区:[1]同济大学电子与信息工程学院控制科学与工程系,上海201800 [2]自主智能无人系统全国重点实验室,教育部自主智能无人系统前沿科学中心,上海自主智能无人系统科学中心,上海智能科学与技术中心,同济大学,上海201210 [3]麻省理工学院信息与决策系统实验室,美国剑桥02139 [4]南洋理工大学电气与电子工程学院,新加坡新加坡639798

出  处:《自动化学报》2025年第4期835-856,共22页Acta Automatica Sinica

基  金:国家自然科学基金(62473292,62088101);上海市科技重大专项(2021SHZDZX0100)资助。

摘  要:研究分布式在线复合优化场景中的几种反馈延迟,包括梯度反馈、单点Bandit反馈和两点Bandit反馈.其中,每个智能体的局部目标函数由一个强凸光滑函数与一个凸的非光滑正则项组成.在分布式场景下,研究每个智能体具有不同时变延迟的场景.基于近端梯度下降算法,分别设计这三种延迟反馈的分布式在线复合优化算法,并且对动态遗憾上界进行分析.分析结果表示,延迟梯度反馈和延迟两点Bandit反馈的动态遗憾上界阶数在期望意义下相同,而延迟单点Bandit反馈的动态遗憾上界稍差于前两者.这表明,存在延迟时,两点Bandit反馈可以在期望意义下达到与梯度反馈相同阶数的动态遗憾上界,且在步长选择合适的情况下,三种反馈类型的平均延迟在动态遗憾上具有相同的阶数.最后通过仿真实验验证了算法的性能和理论分析结果.This paper investigates several types of feedback delays in distributed online composite optimization scenarios,including gradient feedback,one-point Bandit feedback,and two-point Bandit feedback,where each agent's local objective function consists of a strongly convex and smooth function combined with a convex nonsmooth regularizer.In the distributed scenarios,this paper investigates the scenario where each agent has a different time-varying delay.Based on the proximal gradient descent algorithm,distributed online composite optimization algorithms are designed for these three delayed feedback cases respectively and the upper bounds of dynamic regret are analyzed.The analysis results showed that the order of dynamic regret upper bound under delayed gradient feedback and delayed two-point Bandit feedback is identical in the mathematical expectation sense,while that under delayed single-point Bandit feedback is slightly worse than the former two.Therefore,under delays,two-point Bandit feedback can achieve the same order of dynamic regret upper bound as gradient feedback in the mathematical expectation sense,and the average delay for the three feedbacks has the same orders on dynamic regret when choosing suitable step sizes.Finally,the performance of the algorithms and the results of the theoretical analysis are verified through simulations experiments.

关 键 词:分布式在线凸优化 复合优化 反馈延迟 BANDIT 反馈 动态遗憾 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象