检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机应用》2016年第12期3274-3279,共6页journal of Computer Applications
基 金:国家自然科学基金资助项目(61472289);湖北省自然科学基金资助项目(2015CFB254)~~
摘 要:基于统一计算设备架构(CUDA)对图形处理器(GPU)下的并行粒子群优化(PSO)算法作改进研究。根据CUDA的硬件体系结构特点,可知Block是串行执行的,线程束(Warp)才是流多处理器(SM)调度和执行的基本单位。为了充分利用Block中线程的并行性,提出基于自适应线程束的GPU并行PSO算法:将粒子的维度和线程相对应;利用GPU的Warp级并行,根据维度的不同自适应地将每个粒子与一个或多个Warp相对应;自适应地将一个或多个粒子与每个Block相对应。与已有的粗粒度并行方法(将每个粒子和线程相对应)以及细粒度并行方法(将每个粒子和Block相对应)进行了对比分析,实验结果表明,所提出的并行方法相对前两种并行方法,CPU加速比最多提高了40。The parallel Particle Swarm Optimization (PSO) algorithm was improved through Graphics Processor Unit (GPU) based on Compute Unified Device Architecture (CUDA). According to the structural characteristics of the CUDA hardware system, it can be concluded that block is executed serially and the basic scheduled and executive unit of Streaming Multiproeessor (SM) is warp. GPU parallel PSO algorithm based on adaptive warp was carried out in order to make full use of thread parallelism in the block. The dimensions of particles were corresponded to the threads of particles. Each particle was corresponded to one or more warps in accordance with its self-dimension adaptively by using the warp level parallelism of GPU. One or more particles were corresponded to each block. Comparison with the existing coarse-grained parallel approach (corresponding each particle to the thread) and fine-grained parallel approach (corresponding each particle to the block) was made, and the experimental results show that the proposed parallel approach achieves CPU speed-up ratio of 40 more than two kinds of approaches mentioned above.
关 键 词:粒子群优化算法 并行计算 图形处理器 统一计算设备架构 自适应线程束
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.135.204.121