基于序贯检测的快速马尔可夫决策:理论、方法及应用  

Sequential Detection Based Quickest Markov Decision Processes:Theory,Algorithms,and Applications

在线阅读下载全文

作  者:陈祖旭 陈巍[2,3,4] 李长坤 韩宇星 CHEN Zuxu;CHEN Wei;LI Changkun;HAN Yuxing(International Graduate School,Tsinghua University,Shenzhen,Guangdong 518071,China;Department of Electronic Engineering,Tsinghua University,Beijing 100084,China;State Key Laboratory of Space Network and Communications,Beijing 100084,China;Beijing National Research Center for Information Science and Technology,Beijing 100084,China)

机构地区:[1]清华大学深圳国际研究生院,广东深圳518071 [2]清华大学电子工程系,北京100084 [3]天基网络与通信全国重点实验室,北京100084 [4]北京信息科学与技术国家研究中心,北京100084

出  处:《信号处理》2025年第3期448-471,共24页Journal of Signal Processing

基  金:国家自然科学基金(62261160390,62471276);深圳市启动经费项目(QD2023014C);美团科研基金。

摘  要:本文立足存在突变状态与检测噪声复杂环境,针对控制后效性与动作迟滞性问题,探索提升决策与控制时效性的方法,提出了一种基于序贯检测的快速马尔可夫决策框架,并应用于智能电网、疾控、水利等若干典型场景。具体的,本文发掘了统计信号处理中的变化点最速检测与随机最优控制中的马尔可夫决策之间的关联,建立了一种包含四维状态的受约束马尔可夫决策框架。该框架可选择一种可行的联合检测-控制策略,最大化控制对象的期望回报,或达到平均收益与风险的最佳折中。相对于传统的“先检测变化点、后调整可控量”的分层策略,所提出的新方法实现了“边检测变化点,边调整可控量”的跨层协同,可有效应对检测延时、反应迟滞对决策控制时效性带来的挑战。在智能电网、疾控、水利等场景中,均展示了“检中调”的思路显著优于“检后调”的传统方法。最后,本文还简要展望了基于序贯检测的快速马尔可夫决策在海上碳封存、网络攻击检测防御中的潜在应用价值。In this paper,joint signal processing and control methods for complex dynamical systems with statistically change point,observation noise,aftereffects,and action latency were investigated to maximize the overall utility of delay-sensitive decision making.A unified framework combining the quickest change detection in statistical signal pro‐cessing and the Markov decision process in stochastic optimal control was presented along with its potential applications in smart grid,disease control,and hydrology.By leveraging a four-dimensional constrained Markov decision process,the proposed framework maximized the expected reward characterized by the weighted sum of the income and risk,while satisfying various constraints due to operations,feasibility,and environments.In contrast to the conventional lay‐ered infrastructure in which an action is launched after the change point is detected,the new architecture enabled a cross-layer cross-disciplinary collaboration between signal processing and control,which implemented real-time decisions much timelier based on instantaneous likelihood estimation.The paradigm-shift idea brought substantial gain for dynami‐cal or stochastic systems that are sensitive to the latency in decision or control,while suffering from huge detection de‐lay and/or strong aftereffects.It was demonstrated that the joint detection and control strategy outperformed the control-after-detection policy in smart grid,disease control,and hydrology with considerable gain observed.Finally,we briefly envisioned the potential applications of sequential detection based quickest Markov decision processes in carbon capture and storage in the seafloor as well as network attack detection and mitigation.

关 键 词:统计信号处理 随机最优控制 序贯检测 最速变化点检测 马尔可夫决策过程 受约束马尔可夫决策过程 

分 类 号:G202[文化科学—传播学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象