面向自动化领域AI模型训练的光交换计算集群系统设计  

Design of Optical Switching Computing Cluster System for AI Model Training in Automation Field

在线阅读下载全文

作  者:黎泽 彭慧斌 Li Ze;Peng Huibin(School of Electrical Engineering,Guangzhou Railway Polytechnic,Guangzhou 511300,China;School of Automation,Guangdong Polytechnic Normal University,Guangzhou 510665,China)

机构地区:[1]广州铁路职业技术学院电气工程学院,广州511300 [2]广东技术师范大学自动化学院,广州510665

出  处:《机电工程技术》2024年第6期105-108,共4页Mechanical & Electrical Engineering Technology

基  金:广州铁路职业技术学院人才引进项目“人工智能业务驱动的光交换计算集群关键技术研究”(GTXYR2318)。

摘  要:针对自动化领域AI模型训练对计算能力要求越来越高的问题,设计了一种光交换计算集群系统,其包含完整的控制与通信流程,可以实现比电交换计算集群更大的带宽、更小的时延。在系统级层面,从AI服务器内部硬件软件开销、网络开销、算法开销到通信开销等进行了详细的性能建模,对AI模型训练光交换计算集群系统性能计算进行量化,并开发了一个AI模型训练光交换计算集群系统性能仿真软件。开发的仿真软件在不同的参数设置下的计算结果与理论计算的结果一致,软件仿真平均运行时间为0.432 s。软件通过UI交互界面输入参数,再代入建模公式中计算,并将计算结果显示在界面上。该软件积木式的系统搭建,菜单栏式的参数设置降低了使用者的入门和操作难度,易于对光交换计算集群系统进行性能仿真,可指导整个光交换计算集群系统的设计与优化。In response to the increasing demand for computing power in AI model training in the field of automation,an optical switching computing cluster system has been designed,which includes complete control and communication processes and can achieve larger bandwidth and smaller latency than an electric switching computing cluster.At the system level,detailed performance modeling was conducted from the internal hardware and software costs,network costs,algorithm costs,and communication costs of the AI server.The performance calculation of the AI model trained optical switching computing cluster system was quantified,and an AI model trained optical switching computing cluster system performance simulation software was developed.The calculation results of the developed simulation software under different parameter settings are consistent with the theoretical calculation results,and the average running time of the software simulation is 0.432 seconds.The software inputs parameters through the UI interactive interface,then calculates them into the modeling formula,and displays the calculated results on the interface.Building a modular system with menu bar style parameter settings,this software can reduce the difficulty of users′entry and operation,facilitate performance simulation of the optical switching computing cluster system,and guide the design and optimization of the entire optical switching computing cluster system.

关 键 词:人工智能 光交换 AI分布式训练 系统开发 

分 类 号:TP303[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象