检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘克勤 Liu Keqin(Department of Mathematics,Nanjing University,Nanjing 210093)
机构地区:[1]南京大学数学系,南京210093
出 处:《高等学校计算数学学报》2020年第4期372-384,共13页Numerical Mathematics A Journal of Chinese Universities
摘 要:We investigate Whittle index policy for a restless bandit model whose statespace may be enlarged under passive actions.This model arises in many important ap-plications and extends the classical model introduced by Whittle in 1988.In the classicalmodel,one chooses a subset of arms to play at each time and accrue certain reward de-termined by the states of all arms and the subset of chosen arms.The state of each armevolves according to a Markov process whose parameters(transition matrix)may dependon whether or not the arm is selected.The objective is to maximize the time-average re-ward over long-term.Weber and Weiss in 1990 proved the asymptotic optimality of Whit-tle index under a sufficient condition for the classical model,where Whittle's indexabilitywas required.In this paper,we extend Whittle index to the general model as consideredhere.Our extension is based on policy continuation and tie-breaking ordering of Whittleindex when new states join the system.By requiring a positive recurrent sub-state-spaceand boundedness of immediate rewards,we show that randomization can achieve optimal-ity under Whittle's relaxed constraint.We further analyze the fluid dynamics of our modeland show that the asymptotic optimality of Whittle index under the strict constraint canalso be extended.
关 键 词:Restless multi-armed bandits Whittle index state expansion policy contin-uation optimality under relaxed constraints fluid model asymptotic optimality
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.137.210.249