检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周辉[1] 朱虎明[2,3,4] 高天琦 董西淼 张凌云 刘卉杰 陈志鹏 ZHOU Hui;ZHU Huming;GAO Tianqi;DONG Ximiao;ZHANG Lingyun;LIU Huijie;CHEN Zhipeng(The Second Monitoring and Application Center,CEA,Xi'an 710054,China;Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education,XiDian University,Xi'an 710071,China;School of Artificial Intelligence,XiDian University,Xi'an 710071,China;Hangzhou Institute of Technology,XiDian University,Hangzhou 311231,China)
机构地区:[1]中国地震局第二监测中心,陕西西安710054 [2]西安电子科技大学智能感知与图像理解教育部重点实验室,陕西西安710071 [3]西安电子科技大学人工智能学院,陕西西安710071 [4]西安电子科技大学杭州研究院,浙江杭州311231
出 处:《防灾减灾工程学报》2025年第1期21-33,共13页Journal of Disaster Prevention and Mitigation Engineering
基 金:陕西省重点研发计划(2022ZDLGY01-09);光合基金(202302019674);陕西省自然科学基础研究计划(2023-JC-YB-242)资助。
摘 要:AWP-ODC是基于有限差分数值方法来实现大规模三维地震模拟的软件。随着国外对我国高性能计算芯片的出口限制,我国急需发展自己的高性能计算芯片及其软件生态。早期的AWP-ODC加速主要基于NVIDIA GPU软硬件架构来设计优化,近年来,多种异构计算平台迅猛发展,如何基于新的异构计算软硬件平台来加速AWP-ODC具有重要研究价值。为此,本文在一种国产加速卡上对AWP-ODC进行移植。针对耗时较多的核函数dstrqc,通过GPU访存优化和网格参数优化等方式缩短了其运行时间。最后分别在国产类GPU单卡和双卡上,利用Fréchet Kernels地震和8·3鲁甸地震数据集进行性能测试。实验结果表明,在单卡计算环境下,两个数据集的FLOPS分别提高了30.51%和25.21%;在双卡计算环境下,两个数据集的FLOPS分别提高了9.42%和23.6%。AWP-ODC is a software for large-scale 3D seismic simulation based on the finite difference numerical method.Due to foreign export restrictions on high-performance computing chips to China,there is an urgent need to develop China's own high-performance computing chips and software ecosystem.The early acceleration of AWP-ODC was primarily designed and optimized based on the NVID-IA GPU software and hardware architecture.In recent years,various heterogeneous computing plat-forms developed rapidly.How to accelerate AWP-ODC based on new heterogeneous computing soft-ware and hardware platforms showed significant research value.To this end,AWP-ODC was ported to a domestic accelerator card.By optimizing GPU memory access and grid parameters,the execution time of the time-consuming kernel function dstrqc was reduced.Finally,performance tests were con-ducted on a domestic GPU single-card and dual-card setup using the Fréchet Kernels seismic dataset and the 8·3 Ludian earthquake dataset.Experimental results showed that,under a single-card comput-ing environment,the FLOPS for the two datasets increased by 30.51%and 25.21%,respectively.Under a dual-card computing environment,the FLOPS for the two datasets increased by 9.42%and 23.6%,respectively.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.105.175