检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高琬蓉 方建滨 黄春 徐传福 王峥 Wan-Rong Gao;Jian-Bin Fang;Chun Huang;Chuan-Fu Xu;Zheng Wang(College of Computer Science,National University of Defense Technology,Changsha 410073,China;School of Computing,University of Leeds,Leeds,LS29JT,U.K.)
机构地区:[1]College of Computer Science,National University of Defense Technology,Changsha 410073,China [2]School of Computing,University of Leeds,Leeds,LS29JT,U.K.
出 处:《Journal of Computer Science & Technology》2023年第6期1323-1338,共16页计算机科学技术学报(英文版)
基 金:funded by the National Key Research and Development Program of China under Grant No.2018YFB0204301;the National Natural Science Foundation of China under Grant Nos.61972408 and 61872294.
摘 要:Cache performance is a critical design constraint for modern many-core systems.Since the cache often works in a"black-box"manner,it is difficult for the software to reason about the cache behavior to match the running software to the underlying hardware.To better support code optimization,we need to understand and characterize the cache be-havior.While cache performance characterization is heavily studied on traditional x86 architectures,there is little work for understanding the cache implementations on emerging ARMv8-based many-cores.This paper presents a comprehensive study to evaluate the cache architecture design on three representative ARMv8 multi-cores,Phytium 2000+,ThunderX2,and Kunpeng 920(KP920).To this end,we develop wrBench,a micro-benchmark suite to measure the realized latency and bandwidth of caches at different memory hierarchies when performing core-to-core communication.Our evaluation pro-vides inter-core latency and bandwidth in different cache levels and coherency states for the three ARMv8 many-cores.The quantitative performance data is shown in tables.We mine the characteristics of caches and coherency protocols by analyzing the data for the three processors,Phytium 2000+,ThunderX2,and KP920.Our paper also provides discussions and guidelines for optimizing memory access on ARMv8 many-cores.
关 键 词:ARMv8 many-core cache architecture microbenchmark core-to-core communication
分 类 号:TP31[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7