基于MPI的鲲鹏CPU核间通信研究  

Research on intercore communication of Kunpeng CPU based on MPI

在线阅读下载全文

作  者:周岩 王鹏[1,3] 王琨予[1] ZHOU Yan;WANG Peng;WANG Kun-yu(School of Computer Science and Engineering,Southwest Minzu University,Chengdu 610041,China;Xi′an Color Cloud Software Technology Co.,LTD,Xi’an 712034,China;Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu 610041,China;Guangdong Domestic Server Engineering Center,Guangzhou 510000,China)

机构地区:[1]西南民族大学计算机科学与工程学院,四川成都610041 [2]西安彩色云软件科技有限公司,陕西西安712034 [3]中国科学院成都计算机应用研究所,四川成都610041 [4]广东省国产服务器工程技术研究中心,广东广州510000

出  处:《西南民族大学学报(自然科学版)》2024年第3期328-335,共8页Journal of Southwest Minzu University(Natural Science Edition)

基  金:国家自然科学基金资助项目(60702075);西南民族大学中央高校基本科研业务费专项(2017NZYQN27);广东省科学技术厅省科技发展专项资金项目(2016B090918062);广州市产学研协同创新重大专项(201604010115)。

摘  要:核间通信延时是影响高性能计算系统整体运行效率的重要因素.国产鲲鹏CPU在高性能计算领域应用日益广泛,针对鲲鹏CPU的缓存架构及多核间接口互联进行分析,研究影响鲲鹏CPU核间通信延时的因素.在消息传递接口(MPI)环境下进行节点内核间通信实验,对包括跨三级缓存、跨物理CPU通信等不同模式下通信延时进行对比,发现通信数据包大于500 KB后,跨L3 Cache TAG的通信延时反优于共享L3 Cache TAG的通信延时.针对通信数据包在64 KB大小时的通信延迟异常,分析得出是MPI的Eager模式和Rendezvous模式的默认切换阈值所造成.对这两种模式进行实验对比,验证不同大小的通信数据包在不同模式下和跨核通信时的延时特征,Eager模式更适合低延时的小消息发送.在实际应用中可根据通信数据包大小调整两种模式的默认切换阈值,以达到更好的传输效果.实验结果表明由于鲲鹏CPU存在复杂的多核结构,在并行计算程序设计时可以进行针对性优化,以提升程序的运行效率.Intercore communication delay is an important factor affecting the overall operation efficiency of high⁃performance computing systems.The domestic Kunpeng CPU is increasingly widely used in the field of high⁃performance computing.This pa⁃per analyzed the cache architecture and multi⁃core interface interconnection of Kunpeng CPU,and studied the factors affecting the communication delay between Kunpeng CPU cores.By conducting inter kernel communication experiments in the message passing interface(MPI)environment,comparing communication delays in different modes including cross three⁃level caching and cross physical CPU communication,it was found that when communication packets were greater than 500KB,the communi⁃cation delay across L3 Cache TAG was actually lower than that of shared L3 Cache TAG.In response to the communication de⁃lay anomaly of communication packets at a size of 64KB,analysis indicated that it was caused by the default switching threshold of the Eager mode and Rendezvous mode of MPI.By comparing these two modes through experiments,it was verified that the Ea⁃ger mode was more suitable for sending small messages with low latency when communication packets of different sizes were in different modes and cross core communication.In practical applications,the default switching thresholds for both modes could be adjusted based on the size of communication data packets to achieve better transmission results.The experimental results indica⁃ted that due to the complex multi⁃core structure of Kunpeng CPU,targeted optimization could be carried out in parallel compu⁃ting program design to improve program efficiency.

关 键 词:鲲鹏CPU 核间通信 消息传递接口 高性能计算 共享缓存 

分 类 号:TP332[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象