检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马智慧 苏晓明 李桂君 田振宇 MA Zhi-hui;SU Xiao-ming;LI Gui-jun;TIAN Zhen-yu(a.Science and Technology Department of Liao yang Campus,Shenyang University of Technology,Shenyang 111003,China;Educational Administration Department,Shenyang University of Technology,Shenyang 111003,China;School of Chemical Process Automation,Shenyang University of Technology,Shenyang 111003,China;Engineering Training Center of Liaoyang Campus,Shenyang University of Technology,Shenyang 111003,China)
机构地区:[1]沈阳工业大学辽阳分校科技处,辽宁辽阳111003 [2]沈阳工业大学教务处,辽宁辽阳111003 [3]沈阳工业大学化工过程自动化学院,辽宁辽阳111003 [4]沈阳工业大学辽阳分校工程实践中心,辽宁辽阳111003
出 处:《控制工程》2021年第9期1893-1901,共9页Control Engineering of China
基 金:国家自然科学基金资助项目(61074005)。
摘 要:启发式动态规划算法(HDP)是近似动态规划(ADP)的一种实现方法,它将神经网络、动态规划和强化学习融为一体。然而,现存的启发式动态规划算法需要假设系统的内部动态完全已知,这一条件在实际工程系统中是极其严格的。为了解决这个问题,提出了一种基于迭代步神经网络训练策略的启发式动态规划算法,该算法采用定点训练并且通过依赖于状态的性能指标的导数来求得控制量,评价网络用于近似值函数,而动作网络用于近似最优控制策略,因此该算法允许在不知道系统内部动态的情况下执行启发式动态规划算法。通过一个非线性系统的引例以及球杆系统的控制来验证此算法的有效性。Heuristic dynamic programming(HDP) algorithm is an expressional form of approximate dynamic programming(ADP) algorithm, which integrates neural network, dynamic programming and reinforcement learning. However, the existing HDP algorithm needs to meet the condition that the internal states are completely known, which is extremely strict in practical engineering systems. To solve this problem, the HDP algorithm based on iterative step neural network training strategy is introduced, which treats neural network in fixed point and obtains control by the derivative of performance index with respect to state. Critic network is used to approximate the value function, whereas action network is used to approximate the optimal control strategy. Therefore, the algorithm allows the implementation of HDP algorithm without knowing the internal dynamics of the system. To show the validity of this algorithm, an example of a non-linear system and the control of a ball and beam system are used to verify it.
关 键 词:启发式动态规划 优化控制 性能指标函数 神经网络 非线性球杆系统
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.128.29.244