检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张文韬 汪璐[1] 程耀东[1] Zhang Wentao;Wang Lu;Cheng Yaodong(Computing Center, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049;University of Chinese Academy of Sciences, Beijing 100049)
机构地区:[1]中国科学院高能物理研究所计算中心,北京100049 [2]中国科学院大学,北京100049
出 处:《计算机研究与发展》2019年第7期1578-1586,共9页Journal of Computer Research and Development
基 金:国家重点研发计划项目(2017YFB0203203);国家自然科学基金项目(11575223)~~
摘 要:高能物理计算是典型的数据密集型计算.分布式存储系统的吞吐率和响应时间是最关键的性能指标,往往也是重点关注的性能优化目标.存储系统中存在大量可供调节的参数,这些参数的设置对系统的性能有着很大的影响.目前,这些参数被直接设置为静态值,或者由经验丰富的管理员定义一些启发式规则来自动调整.考虑到数据访问模式和硬件配置的多样性,以及依靠人类经验来找到数百个交互参数的启发式规则的难度,这2种方法的效果都不太乐观.实际上,如果把调节引擎看作是智能体,把存储系统看作是环境,存储系统的参数调节问题是典型的顺序决策问题.因此,基于高能物理计算的数据访问特点,提出了用强化学习的方法来进行自动化的参数调优.实验表明:在相同的测试环境下,以Lustre文件系统默认参数为基准,该方法可使其吞吐率提升30%左右.Computing of high energy physics is a typical data-intensive application. The throughput and response time of distributed storage system are key performance indicators, and they are often the targets of performance optimization. There are a large number of parameters that can be adjusted in a distributed storage system. The setting of these parameters has great influence on the performance of the system. At present, these parameters are either set with static values or automatically tuned by some heuristic rules defined by experienced administrators. Neither of the method is optimistic taking into account the diversity of data access patterns and hardware capabilities, and the difficulty of finding heuristic rules for hundreds of interacted parameters based on human experience. In fact, if the tuning engine is regarded as an agent and the storage system is regarded as the environment, the parameter adjustment problem of the storage system can be treated as a typical sequential decision problem. Therefore, based on data access characteristics of high energy physics calculation, we propose an automated parameter tuning method using the reinforcement learning. Experiments show that in the same test environment, using the default parameters of the Lustre file system as a baseline, this method can increase the throughput by about 30%.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7