NTRU格基密钥封装方案GPU高性能实现  

Efficient GPU Implementation of NTRU Lattice-Based Key Encapsulation Mechanism

在线阅读下载全文

作  者:李文倩 沈诗羽 赵运磊 LI Wen-Qian;SHEN Shi-Yu;ZHAO Yun-Lei(School of Computer Science,Fudan University,Shanghai 200433;State Key Laboratory of Cryptology,Beijing 100878)

机构地区:[1]复旦大学计算机科学技术学院,上海200433 [2]密码科学技术全国重点实验室,北京100878

出  处:《计算机学报》2024年第9期2163-2178,共16页Chinese Journal of Computers

基  金:国家重点研发计划基金资助项目(No.2022YFB2701601);密码科学技术国家重点实验室面上课题基金资助项目(No.MMKFKT202227);上海市科委技术标准基金资助项目(No.21DZ2200500);上海市协同创新基金资助项目(No.XTCX-KJ-2023-54);上海市科委区块链关键技术攻关专项基金资助项目(No.23511100300)资助.

摘  要:随着量子计算技术的发展,传统加密算法受到的威胁日益严重.为应对量子计算时代的挑战,各国正积极加强后量子密码算法的实现和迁移部署工作.由于NTRU密码方案具有结构简洁、计算效率高、尺寸较小、无专利风险等优点,因此NTRU格基密钥封装算法对于后量子时代的密码技术储备和应用具有重要意义.同时,图形处理器(Graphics Processing Unit,GPU)以其强大的并行计算能力、高吞吐量、低能耗等特性,已成为当前高并发密码工程实现的重要平台.本文给出后量子密码算法CTRU/CNTR的首个GPU高性能实现方案.对GPU主要资源占用进行分析,我们综合考虑并行计算、内存访问、数据布局和算法优化等多个方面,采用一系列计算和内存优化技术,旨在并行加速计算、优化访存、合理占用GPU资源以及减少I/O时延,从而提高本方案的计算能力和性能.本文的主要贡献在于以下几个方面:首先,针对模约减操作,使用NVIDIA并行指令集实现,有效减少所需指令条数;其次,针对耗时的多项式乘法模块,采用混合基NTT,并采用层融合、循环展开和延迟约减等方法,加快计算速度;此外,针对内存重复访问和冲突访问等问题,通过合并访存、核函数融合等优化技术,实现内存的高效访问;最后,为实现高并行的算法,设计恰当的线程块大小和数量,采用内存池机制,实现多任务的快速访存和高效处理.基于NVIDIA RTX4090平台,本方案CTRU768实现中密钥生成、封装和解封装的吞吐量分别为每秒1170.9万次、926.7万次和315.4万次.与参考实现相比,密钥生成、封装和解封装的吞吐量分别提高了336倍、174倍和128倍.本方案CNTR768实现中密钥生成、封装和解封装的吞吐量分别为每秒1117.3万次、971.8万次和322.2万次.与参考实现相比,密钥生成、封装和解封装的吞吐量分别提高了329倍、175倍和134倍;与开源Kyber实现相比,密钥生成、�As quantum computing technology advances,the threats to traditional encryption algorithms are becoming more and more serious.This entails not only developing robust encryption techniques resilient to quantum attacks but also implementing comprehensive strategies to ensure the security of sensitive data and communication channels in the post-quantum era.In order to meet the challenges of the quantum computing era,countries are actively strengthening the implementation,migration and deployment of post-quantum cryptographic algorithms.Since the NTRU cryptographic scheme has the advantages of simple structure,high computational efficiency,small size,and no patent risk,the NTRU lattice-based key encapsulation algorithm is of great significance to the reserve and application of cryptographic technology in the post-quantum era.At the same time,the Graphics Processing Unit(GPU),with its powerful parallel computing capabilities,high throughput,low energy consumption and other characteristics,has become an important platform in the implementation of current high-concurrency cryptographic engineering.We propose the first efficient GPU implementation of the post-quantum cryptographic algorithm CTRU/CNTR.Taking into account multiple aspects such as parallel computing,memory access,data layout and algorithm optimization,the main resource occupancy of the GPU is analyzed.We use a series of computation and memory optimization techniques to accelerate parallel computing,optimize memory access,reasonably occupy GPU resources,and reduce I/O delays,thereby improving the computing power and performance of this solution.The main contributions of our work are as follows:First,for the modular reduction operation,it is implemented using the NVIDIA parallel instruction set,which effectively reduces the number of instructions required.Besides,for the time-consuming polynomial multiplication module,we employ a mixed-base Number-Theoretic Transform(NTT)and utilize methods such as layer fusion,loop unrolling,and delayed reduction to speed up

关 键 词:后量子密码 格基密码 密钥封装方案 并行处理 图形处理器 

分 类 号:TP309[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象