检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]软件新技术国家重点实验室,南京大学计算机科学与技术系,南京210093 [2]Intel中国有限公司,北京100020
出 处:《南京大学学报(自然科学版)》2010年第2期149-158,共10页Journal of Nanjing University(Natural Science)
基 金:国家“863”计划(2007AA01Z178);江苏省自然科学基金(BK2006712)
摘 要:对称多处理(symmetric multiprocessor,SMP)机群系统因其优越的性价比和良好的可扩展性,已经成为当今高性能计算的主流结构.其中,单节点采用Intel双路四核平台已经逐渐成为目前高性能计算服务器的主流平台.由于一个CPU的四个核心共享一根前端总线,而且两根前端总线并不完全独立,前端总线竞争对访存密集型程序的性能有很大的影响.本文针对Intel Bensley双路四核平台特性,给出了前端总线竞争对访存密集型message passing interface(MPI)程序性能影响的计算模型,并编写程序和利用实例验证的该计算模型的有效性.Systemetric muhiprocessor (SMP) clusters are the mainstream architecture in high performance computing (HPC) because of their good cost performance ratio and excellent scalability. And Intel 2-way Quad-Core platform is the main stream platform on signal node. However, on the popular Intel 2-way Quad-Core platform named Bensley, front side bus(FSI3) competition heavily affects the performance of memory intensive applications because four cores in each CPU share a single FSB and dual FSB are not completely independent. Message Passing Interface (MPI) is both a computer specification and is an implementation that allows many computers to communicate with one another. It is widely accepted by the parallel computing because of its high performance, scalability, and portability. This paper gives a model to predict the performance impact of memory intensive MPI application by FSB competition on Intel Bensley 2 way Quad-Core platform. To discuss the issue, we introduce a new variable called Speeddown to depict the performance decline by FSB competition. Generally, a complex HPC MPI application can be divided into numbers of basic blocks, in which there is continuous and balanced bus utilization. By analyzing the address bus utilization and data bus utilization of the system when running a single basic block process binding on core 0 and the relationship between bus utilization and the number of data read from and write back memory, we deduce the equations to predict the Speeddown when running 2/4/8 basic block processes binding on different cores. For complex memory intensive MPI applications, we focus on its computing time to study the performance impact by FSB competition. Since the computing time can be divided into serial time and parallel time, we analyze their Speeddown when creating 4 or 8 processes binding on certain cores separately. Then a method is introduced to merge them together and create the final performance impact model. A testing application is programmed to validate the effective
关 键 词:访存密集型应用 BENSLEY 前端总线 地址总线利用率 数据总线利用率
分 类 号:TP302.1[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.67.249