检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王欣夷 王耀彬[1,2] 李凌 杨洋[3] 卜得庆 刘志勤[1,2] WANG Xinyi;WANG Yaobin;LI Ling;YANG Yang;BU Deqing;LIU Zhiqin(School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang,Sichuan 621010,China;Sichuan Civil-Military Integration Institute,Southwest University of Science and Technology,Mianyang,Sichuan 621010,China;Sichuan Institute of Computer Sciences,Chengdu 610041,China)
机构地区:[1]西南科技大学计算机科学与技术学院,四川绵阳621010 [2]西南科技大学四川省军民融合研究院,四川绵阳621010 [3]四川省计算机研究院,成都610041
出 处:《计算机工程》2020年第8期210-215,222,共7页Computer Engineering
基 金:国家自然科学基金(61672438);国家留学基金委项目(CSC201908510040);四川省科技计划项目(2019YJ0326);四川省教育厅研究项目(18ZB0603);西南科技大学科研项目(18lzx451,17lzx621);西南科技大学研究生创新基金(19ycx0051)。
摘 要:线程级推测(TLS)技术的有效运用可提高多核芯片的硬件资源利用率,其已在多种串行应用的自动并行化工作中取得了较好效果,但目前缺乏对HPEC应用子程序级线程推测方面的有效分析。针对该问题,设计子程序级推测的剖析机制及核心数据结构,选取HPEC中7个具有代表性的程序,挖掘其子程序级的最大潜在并行性,并结合线程粒度、并行覆盖率、子程序调用次数、数据依赖及源码,对程序的加速比进行分析。实验结果表明,fdfir、svd、db和ga程序的加速比在2.23~11.31,tdfir程序的加速效果最好,加速比达到221.78,对于包含多次非重度数据依赖子程序调用的应用,更适合采用子程序级TLS技术测试其并行性。Effective application of Thread-Level Speculation(TLS)technology can improve the hardware resource utilization of multicore chips,and has acquired successful results in automatic parallelization of multiple serial applications.However,it lacks efficient analysis of subroutine-level thread speculation of HPEC applications.To address the problem,this paper designs an analysis mechanism for subroutine-level speculation and its core data structure.Then seven representative programs in HPEC are selected,and their maximum potential parallelism at the subroutine level is excavated.On this basis,the acceleration ratio of the programs is analyzed by combining the granularity of threads,coverage rate of parallelism,number of calls of subroutines,data dependency and source code.Analysis results show that the acceleration ratio of fdfir,svd,db and ga programs range from 2.23 to 11.31.The tdfir program works best for acceleration with the acceleration ratio reaching 221.78.For applications that include multiple calls of subroutines non-heavy data dependency,it is more suitable to adopt subroutine-level TLS technology for parallelism testing.
关 键 词:线程级推测 多核芯片 HPEC基准套件 数据依赖 动态剖析
分 类 号:TP302[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.17.153.20