检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]东南大学自动化学院,南京210096 [2]南京农业大学工学院,南京210031 [3]东南大学复杂工程系统测量与控制教育部重点实验室,南京210096
出 处:《系统工程理论与实践》2014年第7期1885-1894,共10页Systems Engineering-Theory & Practice
基 金:国家自然科学基金重点项目(60934008);国家自然科学基金(71101072,71301077);东南大学优秀博士论文基金(YBJJ1215)
摘 要:针对知识化制造环境下的自适应调度问题,提出基于状态-动作不确定性偏向Q学习(state-action uncertainty bias based Q-learning,简称SAUBQ学习)的知识化制造自适应调度策略.该策略针对传统Q学习收敛速度慢,训练时间长等问题,引入信息熵的概念定义了状态不确定性测度,据此定义了Q学习动作偏向信息函数,通过对Q学习奖励函数采用启发式回报函数设计,将动作偏向信息利用附加回报的方式融入学习系统,并证明了算法的收敛性和最优策略不变性.在学习过程中,Q学习根据偏向信息调整搜索空间,减少了Q学习必须探索的有效状态-动作对数目,同时偏向信息根据Q学习结果不断进行调整,避免了不正确的误导.经仿真实验比较,结果表明,该策略具有对动态环境的适应性和大状态空间下收敛的快速性,提高了调度效率.An adaptive scheduling strategy based on state-action uncertainty bias based Q-learning(SAUBQ-learning) is proposed for adaptive scheduling in knowledgeable manufacturing environment. Aim-ing at the problem of slow convergence and long training time in conventional Q-learning process, astate-action uncertainty measure is defined by information entropy, on the basis of which a Q-learningaction bias information function is defined. The bias information is integrated into Q-learning systemby designing heuristic Q-learning reward function, and the optimal strategy invariance and convergence ofSAUBQ-learning are proved. In learning process, search space is adjusted by bias information, the effectivestate-action number explored by Q-learning is reduced. Furthermore, the bias information is also continu-ally updated by Q-learning results, and misleading Q-learning process is avoided. Simulation experimentresults indicate that the strategy has better adaptive feature in dynamic environment and the feature ofconverging quickly in large state space, and improves the efficiency of scheduling.
分 类 号:TH165[机械工程—机械制造及自动化]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.31