检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王达伟[1,2,3] 曹政[1,2,3] 刘新春[1,2] 游定山[1,2,3] 孙凝晖[1,2]
机构地区:[1]中国科学院计算技术研究所,北京100190 [2]中国科学院计算机系统结构重点实验室,北京100190 [3]中国科学院研究生院,北京100049
出 处:《计算机研究与发展》2008年第12期2069-2078,共10页Journal of Computer Research and Development
基 金:国家"八六三"高技术研究发展计划基金项目(2006AA01A102)~~
摘 要:高性能互联网络交换机是高性能计算机系统的核心部件.科学计算作为高性能计算机的上层应用,不仅要求交换机具有低延迟、高带宽的特性,还要求其在集合通信如广播、多播和同步操作等进行硬件级支持.HyperLink交换机,作为曙光5000计算机系统互联网络的重要组成部件,具有38.4ns单级延迟和160Gbps聚合带宽,并能够同时支持16组多播和16组同步操作.理想情况下,1024个节点多播和同步操作可以在2μs内完成,大大加速了科学计算的性能.为了对HyperLink交换机性能进行评价,建立了周期精确的仿真模型.通过模拟证明,对于16端口输入缓冲交换机,3个虚通道是性价比最好的选择;当MTU为1KB时,4KB大小的输入缓冲就可达到最高单播吞吐率.采用理论分析的方法比较了具有相同网络带宽的多轨网络和单轨网络,分析表明,前者可以有效降低网络延迟,因此能够比后者提供更高的网络吞吐率.采用LogP模型分析了HyperLink多播和Barrier的性能,分析表明,HyperLink交换机具有良好扩展性,能够很好支持到数千节点.High performance interconnection network switch plays a critical role in high performance computing (HPC) systems. As upper layer applications of the HPC, scientific computations demand not only low latency and high bandwidth of switch, but also hardware support of collective communications, such as broadcast, multicast, and barrier, etc. HyperLink switch, the core component of Dawning 5000 interconnection networks, has 38. 4ns single stage latency and 160 Gbps aggregated bandwidth, furthermore it supports 16 multicast groups and 16 barrier groups simultaneously. In the ideal condition, 1024 nodes can finish multicast and barrier operations within 2μs, which greatly improves the performance of scientific application. A cycle-accurate switch model is also built to evaluate switch performances. The simulation proves that 3 virtual channels are the best performance-cost choice for 16-port input-buffered switch, and that 4 KB input buffer is sufficient for 1 KB MTU switch to achieve the highest unicast throughput. A comparison between multi-rail networks and single-rail networks which have the same bandwidth as multi-rail networks is also given in theoretical analyses. It is shown that the former could effectively minimize the network latency, and thus provides much higher network throughput than the latter. The LogP model is employed to evaluate HyperLink multicast and barrier performances, which shows that the HyperLink switch has good scalability, easily supporting up to thousands of nodes.
关 键 词:互联网络 交换机 集合通信 多播 同步 ASIC设计
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.137.236