检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:叶雨曦 傅游 梁建国 孟现粉 刘颖[3] 花嵘 YE Yuxi;FU You;LIANG Jianguo;MENG Xianfen;LIU Ying;HUA Rong(College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, Shandong 266590, China;Zhongke Cambrian Technology Co, Ltd, Beijing 100191;Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China)
机构地区:[1]山东科技大学计算机科学与工程学院,山东青岛266590 [2]中科寒武纪科技股份有限公司,北京100191 [3]中国科学院计算技术研究所,北京100190
出 处:《山东科技大学学报(自然科学版)》2021年第4期76-85,共10页Journal of Shandong University of Science and Technology(Natural Science)
基 金:国家重点研发计划项目(2016YFB0200803);山东省重点研发计划项目(2019GGX101066)。
摘 要:面向高性能计算领域的多核、众核处理器飞速发展,为了降低并行编程的难度,提高并行计算效率,数据驱动的并行编程模型成为高性能计算领域的研究热点。AceMesh是数据流驱动的、支持多核和众核异构平台的任务并行编程模型,能自动发掘结构化网格应用中存在的数据驱动的任务图并行性。但如果任务粒度划分较细,其构图过程会造成很大开销。本研究结合“申威26010”异构众核处理器的结构特点,从主、从核通信优化、内存池、无后继任务收集等方面对AceMesh构图过程进行优化,并采用航天飞行器应用中的7个热点子程序对优化效果进行测试。测试数据表明以上优化取得5倍的加速。为验证构图优化对AceMesh整体性能的提升,对航天飞行器应用分别在Acemesh和神威OpenACC的加速效果进行了测试,优化后的AceMesh加速效果约为神威OpenACC的1.5倍。In recent years,the multi-core and many-core processors have developed rapidly.To reduce the difficulty of parallel programming and improve the efficiency of parallel computing,the data-driven task-parallel programming model has become a research hotspot in the field of high-performance computing.As a data-driven task-parallel programming model that supports multi-core and many-core heterogeneous platform,AceMesh can automatically discover the data-driven task graphs parallelism in structured grid applications.However,the composition process of AceMesh can be costly if the task is fine-grained.Based on the unique architecture of SW26010 processor,the composition process of AceMesh’s task graphs was optimized by moving the communication variable to Local Data Memory(LDM),memory pool,and no follow-up task collection.The optimization effect was tested by using seven hot spot subroutines in an aerospace craft application.The test data shows that the optimization has brought about 5 times performance improvement for composition.To verify the improvement of AceMesh’s overall performance by composition optimization,the acceleration effects of AceMesh and Sunway OpenACC on aerospace vehicle applications were compared and the results show that AceMesh has 1.5 times the speedup of Sunway OpenACC.
关 键 词:DAG构图优化 任务并行编程模型 神威·太湖之光 申威处理器 性能
分 类 号:TP311.52[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38