基于多绘制管线的大规模并行体绘制性能优化技术

Performance Optimization Technique for Large-Scale Parallel Volume Rendering Based on Multiple Rendering Pipelines

作　　者：王华维[1,2] 刘若妍艾志玮曹轶[1,2] WANG Huawei;LIU Ruoyan;AI Zhiwei;CAO Yi(Laboratory of Computational Physics,Institute of Applied Physics and Computational Mathematics,Beijing 100088,China;CAEP Software Center for High Performance Numerical Simulation,Beijing 100088,China)

机构地区：[1]北京应用物理与计算数学研究所计算物理重点实验室,北京100088 [2]中物院高性能数值模拟软件中心,北京100088

出　　处：《计算机工程》2024年第8期207-215,共9页Computer Engineering

基　　金：国家重点研发计划(2017YFB0202203)。

摘　　要：针对数值模拟输出的大规模科学数据,体绘制方法为了刻画复杂物理特征,会进行高密度光线采样,但由此带来了极大的计算开销和数据增量。在国产自主CPU高性能计算机上,由于处理器单核的计算能力低于商业CPU,只能使用更多的处理器核来分担体绘制任务,从而引起了采样数据并行通信的可扩展性瓶颈。为充分利用国产自主CPU高性能计算机来高效完成体绘制任务,针对大规模并行体绘制提出一种基于多绘制管线的性能优化技术,通过多管线、多进程的两级并行模式来降低单条管线的并行规模。在大规模并行体绘制中,该技术将绘制目标图像划分成多个子区域,绘制进程则相应分组,每个进程组独立执行一条绘制管线,以完成图像相应子区域的绘制,最后再收集所有的图像子区域,形成完整图像并输出。实验结果表明,优化后的体绘制算法在国产自主CPU高性能计算机上可以扩展到万核规模,并能有效完成体绘制任务。For large-scale scientific data output in numerical simulations,volume rendering methods inevitably perform high-density ray sampling to capture complex physical features,resulting in significant computational overhead and data increment.However,on domestic autonomous-CPU supercomputers,owing to the lower computing power of a single processor core compared to that of commercial CPU,more processor cores must be used to share volume rendering tasks;this leads to scalability bottlenecks in the parallel communication of sampling data.Full utilization of domestic autonomous-CPU supercomputers to efficiently complete volume rendering tasks is an urgent problem that needs to be solved.To address this problem,this paper proposes a performance optimization technique for large-scale parallel volume rendering based on multiple rendering pipelines;here,the parallel scale of a rendering pipeline is reduced by two-level parallelism:first,at the pipeline level,and then,at the process level.In large-scale parallel volume rendering after optimization,the rendered goal image is first divided into multiple sub-regions,and all rendering processes are grouped accordingly.Each process group then executes a rendering pipeline independently,and as a result,the corresponding sub-region of the image is produced.Finally,all sub-regions of the image are collected,and the whole image is output.Experiments demonstrate that the optimized volume rendering algorithm can scale to approximately 10000 processing cores on domestic autonomous-CPU supercomputers and can effectively complete volume rendering tasks.

关键词：体绘制多管线两级并行并行可扩展性性能优化

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多绘制管线的大规模并行体绘制性能优化技术

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多绘制管线的大规模并行体绘制性能优化技术

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索