检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:罗红兵[1] 张晓霞[1] 王伟[1] 武林平[1]
机构地区:[1]北京应用物理与计算数学研究所高性能计算中心,北京100094
出 处:《计算机研究与发展》2014年第6期1263-1269,共7页Journal of Computer Research and Development
基 金:国家"八六三"高技术研究发展计划重大专项基金项目(2012AA01A309)
摘 要:尽管高性能计算机性能提升越来越快,但科学计算应用程序获得同步的性能提升是很困难的.提高科学计算应用程序的执行性能,需要依照高性能计算机体系结构的特点进行针对性的优化,其中单核指令级优化是科学计算应用程序性能优化的重要方面之一.以基于JASMIN(J adaptive structured meshes applications infrastructure)框架实现的Euler程序为例,探讨了科学计算应用程序在Intel Xeon微处理器平台上的具体性能问题和指令级并行性能优化方法,并较大幅度地优化了Euler程序的单核性能.程序优化后,二维和三维两个物理模型计算的总运行时间比优化前减少了21%~34%,核心模块Gas1dapproxy的执行时间缩短了50%以上.性能优化实验表明:流水线效率已成为影响科学计算类实际应用程序计算效率的重要因素,需要通过降低计算语句的依赖度、减少长延迟计算数量等方法予以改进.Achieving a high fraction of performance on super computers is difficult for actual scientific computing applications.An application must be optimized to exploit the characteristics of the architecture,such as inter-node communication,intra-node connection,hierarchy memory structure and the architecture of single processor core,etc.On a cluster comprised of several Intel Xeon multi-core processors,we explore how to improve the instruction-level parallel efficiency of a scientific application on single processor core.Taking Euler program based on a software infrastructure named J adaptive structured meshes applications infrastructure (JASMIN) as an example,we identify the performance hotspots of the application by the performance analysis tools,analyze the performance monitoring data to derive the performance bottlenecks,and tail the code to fit the characteristics of single core architecture.After a few attempts the performance of Euler program improve greatly.The execution time of Gas1dapproxy module of the program is shortened 60%-62%,and the total execution time of program is shortened 21%-34% for a 2D physical model and a 3D physical model respectively.The experiment results show that the pipeline efficiency is one of key factors to achieve higher performance for scientific computing applications.It can be optimized by reducing dependence degree in computation code,decreasing the number of long delay operators,such as replacing division with multiplication.
关 键 词:性能分析 性能优化 XEON 指令级优化 科学计算程序
分 类 号:TP302[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.23.61.129