检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘磊[1] 陈浩 LIU Lei;CHEN Hao(College of Science,Hohai University,Nanjing 210000,China)
出 处:《华中科技大学学报(自然科学版)》2023年第5期26-32,共7页Journal of Huazhong University of Science and Technology(Natural Science Edition)
基 金:国家自然科学基金面上项目(61773152).
摘 要:针对投资组合管理问题,提出一种基于值分布强化学习算法(VD-MEAC)的投资组合框架.首先,以投资组合收益最大化为目标建立强化学习框架,智能体的动作就是投资组合的权重变化;然后,选择股票因子做为智能体观察到的状态信息.在算法设计上通过新颖的技巧来平衡风险与收益:在控制风险方面,Critic网络学习未来收益的整个分布,并排除过度自信的决策信息从而避免过估计带来的风险;在提高收益方面,增加熵正则,鼓励投资者探索动作空间,避免过早陷入局部最优.在数值实验方面,选择真实的股票数据做为金融环境,多次进行测试以验证策略的稳定性.实验结果表明:VD-MEAC策略的收益均值为2.490,夏普比率均值为2.978,并且在收益率、最大回撤和夏普比率等指标上明显优于对照组(等权重,沪深300,DDPG,TD3,SAC),证明了该策略的有效性.Aiming at the problem of portfolio management,a portfolio framework based on value distributional reinforcement learning algorithm(VD-MEAC)was proposed.First,a reinforcement learning framework was established with the goal of maximizing the return of the portfolio,and the action of the agent was the weight change of the portfolio.Then,the stock factor information was selected as the state information observed by the agent.In the design of the algorithm,risks and benefits were balanced through novel techniques.In terms of risk control,the Critic network learned the entire distribution of future benefits,and excluded overconfident decision-making information to avoid the risk of overestimation.In terms of improving benefits,entropy regularization was increased,and investors were encouraged to explore the action space,avoiding falling into local optimum prematurely.In terms of numerical experiments,real stock data was selected as the financial environment,and multiple tests were performed to verify the stability of the strategy.Experimental results show that the average return of the VD-MEAC strategy is 2.490,the average Sharpe ratio is 2.978,and it is significantly better than the control group(equal weight,CSI 300,DDPG,TD3,SAC)in terms of return,maximum drawdown and Sharpe ratio,reflecting the effectiveness of the strategy.
关 键 词:值分布强化学习 投资组合管理 量化投资 因子模型 深度学习
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.149.249.124