基于事件驱动的MapReduce类流量产生方法与网络评测  被引量:1

Event-Driven Method for MapReduce Traffic Generation and Network Evaluation

在线阅读下载全文

作  者:邵恩[1,2] 孙凝晖 郭嘉梁[1,2] 元国军 王展[1] 曹政 SHAO En;SUN Nin-Hui;GUO Jia-Liang;YUAN Guo-Jun;WANG Zhan;CAO Zheng(State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)

机构地区:[1]中国科学院计算技术研究所计算机体系结构国家重点实验室,北京100190 [2]中国科学院大学,北京100049

出  处:《计算机学报》2018年第10期2265-2281,共17页Chinese Journal of Computers

基  金:国家重点研发计划项目(2016YFB0200300;2016YFGX030148;2016YFB0200205;2016GZKF0JT006);国家自然科学基金项目(61572464;61402444);中国科学院战略性先导科技专项(XDB24060600)资助~~

摘  要:大规模网络结构设计是构建大规模分布式系统和E级高性能计算集群的核心技术之一,底层网络设计者需要结合顶层应用通信流量特征,进行网络结构选型与优化.不当的应用通信模型会引起网络结构设计与实际需求的背离,进而导致系统通信和整体性能的下降.传统基于"黑盒"数据分析的流量建模方法存在业务建模粒度粗和应用数据规模扩展性差等缺陷.该研究引入模拟业务内部逻辑的"事件驱动"思想,提出一种针对主流计算模式MapReduce进行流量建模与流量产生方法.与真实应用流量的对比评测显示,该方法能够准确体现MapReduce计算业务所产生网络流量的特征.基于正确的流量模型,该文对四种主流数据中心网络进行了性能模拟分析.结果表明:相较负载随机均匀分布流量,同一种网络在负载MapReduce特性流量时性能将下降超过30%,因此特性流量能更加明显地展现网络拥塞与瓶颈问题.仿真实验所得到的有关网络性能瓶颈、拓扑可扩展性以及网络性价比的结论,为大规模数据中心网络选型和性能优化提供了新的依据.Interconnection network design is one of the core technologies in the constructions of exascale clusters and large-scale distributed systems.Such large-scale computing system is expected to be achieved in the near future due to the rapid innovations of semiconductor logic and memory,architectures,interconnections and other industry technologies.Among these,due to performance and cost factors,interconnection network plays a critical role in such a large-scale computing system.In large-scale clusters or datacenter,the design of interconnection network is facing greater challenges.Firstly,the increasing computing capacity of a single node requires the network providing higher bandwidth and lower latency.Secondly,the increasing number of nodes requires the network has extremely better scalability.Thirdly,the increasing scale of system leads to worse performance of collective communication,which is harmful to the performance and scalability of applications.Fourthly,the increasing number of devices requires the network has better reliability.As the performance of compute nodes keep increasing,interconnection network has gradually become the bottleneck of large-scale computing system.However,switch chip,the core component of interconnection network,can offer limited aggregate bandwidth because of the constraint of physical processes and packaging technologies.The underlying network designers should consider the processing characteristics of the network traffic when selecting and optimizing the network architecture.Improper traffic model will cause the departure between network architecture and characteristics of communication,which will reduce the overall performance of data centers and clusters.Big data platform has the cost-effective advantage of data processing with the feature of simplified programming and parallel computing,which has being more and more recognized by the industry.In recent years,the community of high-performance computing is also increasingly using Big data platform for HPC data processing,which ha

关 键 词:分布式系统 MAPREDUCE 数据中心网络 事件驱动 大规模网络模拟 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象