基于深度强化学习的变步长LMS算法

A variable step size LMS algorithm based on deep reinforcement learning

作　　者：徐君阳张红梅[1] 张坤 Xu Junyang;Zhang Hongmei;Zhang Kun(School of Electrical Engineering and Automation,Wuhan University,Wuhan 430072,China;The Survey Bureau of Hydrology and Water Resources of Changjiang Estuary,Shanghai 200136,China)

机构地区：[1]武汉大学电气与自动化学院,武汉430072 [2]长江水利委员会水文局长江口水文水资源勘测局,上海200136

出　　处：《仪器仪表学报》2025年第2期70-80,共11页Chinese Journal of Scientific Instrument

基　　金：国家自然科学基金(42176186)项目资助。

摘　　要：针对定步长LMS算法在收敛速度和稳态误差之间难以取得平衡的问题以及传统变步长算法对初始参数选择依赖程度高、工作量大且存在主观性的缺陷,提出了一种基于深度强化学习的变步长LMS算法。该算法对初始参数的依赖性小,规避了繁琐的调参流程。首先,构建了一个融合深度强化学习和自适应滤波的算法模型,该模型利用深度强化学习智能体控制步长因子的变化,代替了传统变步长算法中用于步长调整的非线性函数,从而规避了繁琐的实验调参流程,降低了算法使用的复杂性。其次,提出了基于误差的状态奖励和基于步长的动作奖励函数,引入动态奖励与负奖励机制,有效提升算法的收敛速度。此外,设计了基于欠完备编码器的网络结构,提高了强化学习策略的推理能力。通过实验验证,相较于其他较新的变步长算法,所提出的算法具有更快的收敛速度和更小的稳态误差,在不同初始参数下均能快速调整至合理的步长值,减少了实验调参的工作量。将训练完成的网络应用到系统辨识、信号去噪以及截流区龙口水域水位信号的滤波等实际领域中,均取得了良好的性能表现,证明了算法具有一定的泛化能力,并进一步证实了其有效性。This article proposes a variable step size LMS algorithm based on deep reinforcement learning to address the problem of the difficult balance between convergence speed and steady-state error in the fixed step size LMS algorithm,as well as the high dependence on initial parameter selection,heavy workload,and subjective defects of traditional variable step size algorithms.This algorithm has a low dependence on initial parameters and avoids the cumbersome parameter tuning process.Firstly,an algorithm model integrating deep reinforcement learning and adaptive filtering is constructed,which utilizes deep reinforcement learning agents to control the change of step size factors,replacing the nonlinear function used for step size adjustment in traditional variable step size algorithms,thereby avoiding the cumbersome experimental parameter tuning process and reducing the complexity of algorithm use.Secondly,the error-based state reward and step size-based action reward functions are proposed.Dynamic rewards and negative reward mechanisms are introduced,which effectively improves the convergence speed of the algorithm.In addition,a network architecture based on incomplete encoders is designed to improve the inference ability of reinforcement learning strategies.Through experimental verification,compared with other newer variable step size algorithms,the algorithm proposed in this article can quickly adjust to a reasonable step size value under different initial parameters and reduce the workload of experimental parameter tuning,obtaining faster convergence speed and smaller steady-state error.The trained network has been applied to practical fields,such as system identification,signal denoising,and filtering of water level signals at the closure gap,and has achieved good performance,further confirming the generalization ability and effectiveness of the algorithm.

关键词：变步长LMS算法深度强化学习自适应滤波奖励函数

分类号：TN911.7[电子电信—通信与信息系统] TH701[电子电信—信息与通信工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的变步长LMS算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度强化学习的变步长LMS算法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索