基于多绘制管线的大规模并行体绘制性能优化技术  

Performance Optimization Technique for Large-Scale Parallel Volume Rendering Based on Multiple Rendering Pipelines

在线阅读下载全文

作  者:王华维[1,2] 刘若妍 艾志玮 曹轶[1,2] WANG Huawei;LIU Ruoyan;AI Zhiwei;CAO Yi(Laboratory of Computational Physics,Institute of Applied Physics and Computational Mathematics,Beijing 100088,China;CAEP Software Center for High Performance Numerical Simulation,Beijing 100088,China)

机构地区:[1]北京应用物理与计算数学研究所计算物理重点实验室,北京100088 [2]中物院高性能数值模拟软件中心,北京100088

出  处:《计算机工程》2024年第8期207-215,共9页Computer Engineering

基  金:国家重点研发计划(2017YFB0202203)。

摘  要:针对数值模拟输出的大规模科学数据,体绘制方法为了刻画复杂物理特征,会进行高密度光线采样,但由此带来了极大的计算开销和数据增量。在国产自主CPU高性能计算机上,由于处理器单核的计算能力低于商业CPU,只能使用更多的处理器核来分担体绘制任务,从而引起了采样数据并行通信的可扩展性瓶颈。为充分利用国产自主CPU高性能计算机来高效完成体绘制任务,针对大规模并行体绘制提出一种基于多绘制管线的性能优化技术,通过多管线、多进程的两级并行模式来降低单条管线的并行规模。在大规模并行体绘制中,该技术将绘制目标图像划分成多个子区域,绘制进程则相应分组,每个进程组独立执行一条绘制管线,以完成图像相应子区域的绘制,最后再收集所有的图像子区域,形成完整图像并输出。实验结果表明,优化后的体绘制算法在国产自主CPU高性能计算机上可以扩展到万核规模,并能有效完成体绘制任务。For large-scale scientific data output in numerical simulations,volume rendering methods inevitably perform high-density ray sampling to capture complex physical features,resulting in significant computational overhead and data increment.However,on domestic autonomous-CPU supercomputers,owing to the lower computing power of a single processor core compared to that of commercial CPU,more processor cores must be used to share volume rendering tasks;this leads to scalability bottlenecks in the parallel communication of sampling data.Full utilization of domestic autonomous-CPU supercomputers to efficiently complete volume rendering tasks is an urgent problem that needs to be solved.To address this problem,this paper proposes a performance optimization technique for large-scale parallel volume rendering based on multiple rendering pipelines;here,the parallel scale of a rendering pipeline is reduced by two-level parallelism:first,at the pipeline level,and then,at the process level.In large-scale parallel volume rendering after optimization,the rendered goal image is first divided into multiple sub-regions,and all rendering processes are grouped accordingly.Each process group then executes a rendering pipeline independently,and as a result,the corresponding sub-region of the image is produced.Finally,all sub-regions of the image are collected,and the whole image is output.Experiments demonstrate that the optimized volume rendering algorithm can scale to approximately 10000 processing cores on domestic autonomous-CPU supercomputers and can effectively complete volume rendering tasks.

关 键 词:体绘制 多管线 两级并行 并行可扩展性 性能优化 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象