基于值分布最大熵Actor-Critic算法的投资组合管理被引量：5

Portfolio management based on value distributional maximum entropy Actor-Critic algorithm

作　　者：刘磊[1] 陈浩 LIU Lei;CHEN Hao(College of Science,Hohai University,Nanjing 210000,China)

出　　处：《华中科技大学学报（自然科学版）》2023年第5期26-32,共7页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基　　金：国家自然科学基金面上项目(61773152).

摘　　要：针对投资组合管理问题,提出一种基于值分布强化学习算法(VD-MEAC)的投资组合框架.首先,以投资组合收益最大化为目标建立强化学习框架,智能体的动作就是投资组合的权重变化;然后,选择股票因子做为智能体观察到的状态信息.在算法设计上通过新颖的技巧来平衡风险与收益:在控制风险方面,Critic网络学习未来收益的整个分布,并排除过度自信的决策信息从而避免过估计带来的风险;在提高收益方面,增加熵正则,鼓励投资者探索动作空间,避免过早陷入局部最优.在数值实验方面,选择真实的股票数据做为金融环境,多次进行测试以验证策略的稳定性.实验结果表明:VD-MEAC策略的收益均值为2.490,夏普比率均值为2.978,并且在收益率、最大回撤和夏普比率等指标上明显优于对照组(等权重,沪深300,DDPG,TD3,SAC),证明了该策略的有效性.Aiming at the problem of portfolio management,a portfolio framework based on value distributional reinforcement learning algorithm(VD-MEAC)was proposed.First,a reinforcement learning framework was established with the goal of maximizing the return of the portfolio,and the action of the agent was the weight change of the portfolio.Then,the stock factor information was selected as the state information observed by the agent.In the design of the algorithm,risks and benefits were balanced through novel techniques.In terms of risk control,the Critic network learned the entire distribution of future benefits,and excluded overconfident decision-making information to avoid the risk of overestimation.In terms of improving benefits,entropy regularization was increased,and investors were encouraged to explore the action space,avoiding falling into local optimum prematurely.In terms of numerical experiments,real stock data was selected as the financial environment,and multiple tests were performed to verify the stability of the strategy.Experimental results show that the average return of the VD-MEAC strategy is 2.490,the average Sharpe ratio is 2.978,and it is significantly better than the control group(equal weight,CSI 300,DDPG,TD3,SAC)in terms of return,maximum drawdown and Sharpe ratio,reflecting the effectiveness of the strategy.

关键词：值分布强化学习投资组合管理量化投资因子模型深度学习

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于值分布最大熵Actor-Critic算法的投资组合管理被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于值分布最大熵Actor-Critic算法的投资组合管理 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于值分布最大熵Actor-Critic算法的投资组合管理被引量：5