Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现  

GPU Parallel Optimal Design and Implementation of Key Tree Generation Components for Falcon Post-Quantum Algorithms

在线阅读下载全文

作  者:张磊 赵光岳 肖超恩 王建新 ZHANG Lei;ZHAO Guangyue;XIAO Chaoen;WANG Jianxin(Department of Electronic and Communication Engineering,Beijing Electronic Science and Technology Institute,Beijing 100070,China;Department of Cyberspace Security,Beijing Electronic Science and Technology Institute,Beijing 100070,China)

机构地区:[1]北京电子科技学院电子与通信工程系,北京100070 [2]北京电子科技学院网络空间安全系,北京100070

出  处:《计算机工程》2024年第9期208-215,共8页Computer Engineering

基  金:国家重点研发计划(2017YFB0801803);中央高校基本科研业务费资金(328202278,328202261,3282023005);北京高等教育“本科教学改革创新项目”(202110018002);北京电子科技学院一流学科建设项目(20210064Z0401)。

摘  要:近年来,后量子密码算法因其具有抗量子攻击的特性成为安全领域的研究热点。基于格的Falcon数字签名算法是美国国家标准与技术研究所(NIST)公布的首批4个后量子密码标准算法之一。密钥树生成是Falcon算法的核心部件,在实际运算中占用较多的时间和消耗较多的资源。为此,提出一种基于图形处理器(GPU)的Falcon密钥树并行生成方案。该方案使用奇偶线程联合控制的单指令多线程(SIMT)并行模式和无中间变量的直接计算模式,达到了提升速度和减少资源占用的目的。基于Python的CUDA平台进行了实验,验证结果的正确性。实验结果表明,Falcon密钥树生成在RTX 3060 Laptop的延迟为6 ms,吞吐量为167次/s,在计算单个Falcon密钥树生成部件时相对于CPU实现了1.17倍的加速比,在同时并行1024个Falcon密钥树生成部件时,GPU相对于CPU的加速比达到了约56倍,在嵌入式Jetson Xavier NX平台上的吞吐量为32次/s。Recently,post-quantum cryptographic algorithms have become a popular research topic in the field of security owing to their resistance to quantum attacks.The lattice-based Falcon digital signature algorithm is one of the first four post-quantum cryptographic standard algorithms published by NIST.Key tree generation is the core component of the Falcon algorithm,which requires more time and consumes more resources during actual operation.Therefore,this study proposes a GPU-based parallel key tree generation scheme for Falcon that uses the Single Instruction Multiple Threads(SIMT)parallel mode with joint control of parity threads and the direct computation mode without intermediate variables to achieve speedup and reduce resource consumption.Experiments are conducted on a Python-based CUDA platform to verify the accuracy of the results.Falcon key tree generation for the RTX 3060 Laptop has a latency of 6 ms and a throughput rate of 167 times/s.It achieves a 1.17 acceleration ratio relative to the CPU when computing a single Falcon key tree generating part,whereas the GPU achieves an approximately 56 acceleration ratio relative to the CPU when 1024 Falcon key tree generating parts are generated simultaneously;the throughput rate is 32 times/s on the embedded Jetson Xavier NX platform.

关 键 词:后量子密码 Falcon算法 图形处理器 CUDA平台 并行计算 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象