检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:严昱瑾 李海波 赵曈 汪林望 石林 刘涛 谭光明 贾伟乐 孙凝晖 Yu-Jin Yan;Hai-Bo Li;Tong Zhao;Lin-Wang Wang;Lin Shi;Tao Liu;Guang-Ming Tan;Wei-Le Jia;Ning-Hui Sun(State Key Laboratory of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190 China;University of Chinese Academy of Sciences,Beijing 101408,China;Computing System Optimization Laboratory,Huawei Technologies,Beijing 100094,China;Institute of Semiconductors,Chinese Academy of Sciences,Beijing 100083,China;School of Materials Science and Engineering,Yancheng Institute of Technology,Yancheng 224051,China)
机构地区:[1]State Key Laboratory of Processors,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190 China [2]University of Chinese Academy of Sciences,Beijing 101408,China [3]Computing System Optimization Laboratory,Huawei Technologies,Beijing 100094,China [4]Institute of Semiconductors,Chinese Academy of Sciences,Beijing 100083,China [5]School of Materials Science and Engineering,Yancheng Institute of Technology,Yancheng 224051,China
出 处:《Journal of Computer Science & Technology》2024年第1期45-62,共18页计算机科学技术学报(英文版)
基 金:This work was supported by the National Key Research and Development Program of China under Grant No.2021YFB0300600;the National Natural Science Foundation of China under Grant Nos.92270206,T2125013,62032023,61972377,T2293702,and 12274360;the Chinese Academy of Sciences Project for Young Scientists in Basic Research under Grant No.YSBR-005;the Network Information Project of Chinese Academy of Sciences under Grant No.CASWX2021SF-0103;the Key Research Program of Chinese Academy of Sciences under Grant No.ZDBSSSW-WHC002.
摘 要:The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibits excellent scalability in large-scale simulations.Based on algorithmic and system-level optimizations,we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with acceler-ators.In terms of algorithmic optimizations,the original all-band conjugate gradient algorithm is refined to achieve faster convergence,and mixed precision computing is adopted to increase overall efficiency.In terms of system-level optimiza-tions,the original two-layer parallel structure is replaced by a coarse-grained parallel method.Optimization strategies such as multi-stream,kernel fusion,and redundant computation removal are proposed to increase further utilization of the com-putational power provided by the heterogeneous machines.As a result,our optimized LS3DF can scale to a 10-million sili-con atoms system,attaining a peak performance of 34.8 PFLOPS(21.2% of the peak).All the improvements can be adapt-ed to the next-generation supercomputers for larger simulations.
关 键 词:single instruction multiple thread accelerator electronic structure high-performance computing linearly scaling three-dimensional fragment(LS3DF)
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.63