检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Seog Chung Seo Sang Woo An Dooho Choi
机构地区:[1]Kookmin University,Seoul,02707,Korea [2]Telecommunications Technology Association(TTA),Gyeonggi-do,13591,Korea [3]Korea University,Sejong,30019,Korea
出 处:《Computers, Materials & Continua》2023年第4期1963-1980,共18页计算机、材料和连续体(英文)
基 金:supported by the National Research Foundation of Korea (NRF)grant funded by the Korea government (MSIT) (No.2022R1C1C1013368);This was partly supported in part by Korea University Grant and in part by the Institute of Information and Communications Technology Planning and Evaluation (IITP)Grant through the Korean Government[Ministry of Science and ICT (MSIT)];Development of Physical Channel Vulnerability-Based Attacks and its Countermeasures for Reliable On-Device Deep Learning Accelerator Design,under Grant 2021-0-00903.
摘 要:Since 2016,the National Institute of Standards and Technology(NIST)has been performing a competition to standardize post-quantum cryptography(PQC).Although Falcon has been selected in the competition as one of the standard PQC algorithms because of its advantages in short key and signature sizes,its performance overhead is larger than that of other lattice-based cryptosystems.This study presents multiple methodologies to accelerate the performance of Falcon using graphics processing units(GPUs)for server-side use.Direct GPU porting significantly degrades performance because the Falcon reference codes require recursive functions in its sampling process.Thus,an iterative sampling approach for efficient parallel processing is presented.In this study,the Falcon software applied a fine-grained execution model and reported the optimal number of threads in a thread block.Moreover,the polynomial multiplication performance was optimized by parallelizing the number-theoretic transform(NTT)-based polynomial multiplication and the fast Fourier transform(FFT)-based multiplication.Furthermore,dummy-based parallel execution methods have been introduced to handle the thread divergence effects.The presented Falcon software on RTX 3090 NVIDA GPU based on the proposed methods with Falcon-512 and Falcon-1024 parameters outperform at 35.14,28.84,and 34.64 times and 33.31,27.45,and 34.40 times,respectively,better than the central processing unit(CPU)reference implementation using Advanced Vector Extensions 2(AVX2)instructions on a Ryzen 95900X running at 3.7 GHz in key generation,signing,and verification,respectively.Therefore,the proposed Falcon software can be used in servers managing multiple concurrent clients for efficient certificate verification and be used as an outsourced key generation and signature generation server for Signature as a Service(SaS).
关 键 词:DSA FALCON GPU CUDA software optimization
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171