检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Song Liu Yuan-Zhen Cui Nian-Jun Zou Wen-Hao Zhu Dong Zhang Wei-Guo Wu
机构地区:[1]School of Electronic Information and Engineering, Xi’an Jiaotong University, Xi’an 710049, China [2]School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China [3]Xi’an Research Institute of Surveying and Mapping, Xi’an 710054, China [4]State Key Laboratory of Geo-Information Engineering, Xi’an 710054, China
出 处:《Journal of Computer Science & Technology》2019年第2期456-475,共20页计算机科学技术学报(英文版)
基 金:the National Key Research and Development Program of China under Grant No.2016YFB0201800;the National Natural Science Foundation of China under Grant Nos.91630206 and 91330117.
摘 要:DOACROSS loops are significant parts in many important scientific and engineering applications,which are generally exploited pipeline/wave-front parallelism by loop transformations.However,previous work almost statically performs iterations in parallel threads,thus causing a waste of computing resources in thread synchronization.This paper proposes a brand-new parallel strategy for DOACROSS loops that provides a dynamic task assignment with reduced dependences to achieve wave-front parallelism through loop tiling.The proposed strategy uses a master-slave parallel mode and some customized structures to realize dynamic and flexible parallelization,which effectively avoids threads from waiting in communication.An efficient tile size selection(TSS)approach is also proposed to preserve data reuse in cache for tiled codes.The experimental results show that the proposed parallel strategy obtains good and stable speedups over six typical benchmarks with different problem sizes and different numbers of threads on an Intel■Xeon■32-core serve?.And it outperforms two static strategies,a barrier-based strategy and a post/wait-based strategy,by 32% and 20% in average performance,respectively.This strategy also yields a better performance than a mutex-based dynamic strategy.Besides,it has been demonstrated that the proposed TSS approach can achieve a near-optimal performance and is comparable with a state-of-the-art TSS approach.
关 键 词:DOACROSS LOOP WAVE-FRONT PARALLELISM TILE size selection dynamic task ASSIGNMENT synchronization optimization
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28