检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张晗 钱育蓉[1] 王跃飞[1] 陈人和 田宸玮 ZHANG Han;QIAN Yu-rong;WANG Yue-fei;CHEN Ren-he;TIAN Chen-wei(School of Software,Xinjiang University,Urumqi 830008,China)
机构地区:[1]新疆大学软件学院
出 处:《计算机工程与设计》2019年第8期2181-2189,共9页Computer Engineering and Design
基 金:国家自然科学基金项目(61562086、61462079);新疆维吾尔自治区创新团队基金项目(XJEDU2017T002)
摘 要:为设计基于固定序的Bellman-Ford算法在CUDA平台下并行优化方案,结合算法计算密集和数据密集的特点。从核函数计算层面,提出访存优化方法和基于固定序优化线程发散;从CPU-GPU传输层面,提出基于CUDA流优化数据传输开销方法。对不同显卡进行测试,参照共享内存容量划分线程块、缩减迭代后向量维度并使用CUDA流缩短首次计算时延,相比传统算法,改进后并行算法加速比在200倍左右。该并行优化方案验证了固定序在CUDA平台具有可行性和可移植性,可作为多平台研究参照。To design a parallel optimization scheme based on the fixed-order Bellman-Ford algorithm on the CUDA platform,the algorithm was computationally intensive and data-intensive.From the computational level of kernel function,the memory access optimization method and the fixed-order optimization thread divergence were proposed.From the CPU-GPU transmission level,the data transmission overhead method based on CUDA stream was proposed.After testing different graphics cards,the thread block was divided with reference to the shared memory capacity,the vector dimension was reduced after iteration,and the first calculation delay was shortened using the CUDA stream.The improved parallel algorithm has an acceleration ratio of about 200 times compared with the conventional algorithm.The parallel optimization scheme verifies that the fixed order is feasible and portable on the CUDA platform and can be used as a reference for multi-platform research.
关 键 词:固定序改进算法 Bellman-Ford算法 并行计算 性能可移植性 图形处理器 统一计算设备架构
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43