检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:唐长成 叶佐昌[1] TANG Chang-cheng;YE Zuolchang(Institution of Microelectronic,Tsinghua University,Beijing 100083,China)
机构地区:[1]清华大学微电子所,北京100083
出 处:《微电子学与计算机》2019年第1期46-50,共5页Microelectronics & Computer
摘 要:本文主要致力于解决参数化形式的优化问题,即minθf(θ,w),其中θ是需要优化的变量,w则是对应不同优化问题的参数,在现实中经常会遇到需要解决一系列不同参数下的优化问题.在对某种特定结构的问题下,通过对不同的参数训练一个模型来解决所有参数下的优化问题.和传统的方法不一样,并不是通过对不同的参数多次独立抽样来训练我们的模型,而是利用强化学习的方法加速训练过程.强化学习算法中分别用策略网络来得到优化结果和利用价值网络来评价策略好坏,通过迭代地训练两个网络来优化策略.在后面一些数学例子和电路优化的例子中显示强化学习的方法取得了比较好的效果.In this paper we are focusing on solving parametric optimization problems,i.e.minθf(θ,w),whereθis the variable to be optimized and w is a vector that parameterize the optimization problem.Such kind of problems are very commonly seen in reality.We propose an efficient method to train a model that connects the solution to the parameters and thus solve all the problems with the same structure and different parameters at the same time.During training process,instead of solving a series of optimization problems with randomly sampled w independently,we adopt reinforcement learning to accelerate the training process.Two networks are trained alternately.The first network is a value network,and it is trained to fit the target loss function.The second network is a policy network,whose output is connected to the inputθof the value network and it is trained to minimize the output of the value network.Experiments demonstrate the effectiveness of the proposed method.
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229