基于通用计算平台SM4-CTR算法并行实现与优化  被引量:4

Parallel Implementation and Optimization of SM4-CTR Algorithm Based on General Computing Platform

在线阅读下载全文

作  者:李晓东[1] 胡一鸣 池亚平[1] 钱榕 张健毅 LI Xiao-Dong;HU Yi-Ming;CHI Ya-Ping;QIAN Rong;ZHANG Jian-Yi(Beijing Electronic Science and Technology Institute,Beijing 100070,China)

机构地区:[1]北京电子科技学院,北京100070

出  处:《密码学报》2022年第4期663-676,共14页Journal of Cryptologic Research

基  金:国家重点研发计划(2018YFB1004100)。

摘  要:随着大数据、云计算、5G通信技术的迅速发展,数据传输安全问题日益凸显,密码算法的设计和高效实现变得尤为重要,能高速运行的国产密码算法已成为保护国家安全的关键.与此同时,原本只用于图像计算的硬件GPU,在编程模型CUDA发布后就成为通用的、普及化的算力资源.本文基于通用的计算机平台,提出了利用其本地GPU进行CTR工作模式下SM4算法高速加解密的并行实现和优化方案.实验表明,本文提出的SM4-CTR并行加解密方案能够有效提高SM4算法的运行效率,在通用的计算机平台上,能够达到40倍加速比,加解密速率达到了14.192 Gbps.实验中还分析了线程块划分对GPU并行加速效果的影响,最优线程块大小为128到512,且必须为32的整倍数.最后,基于本文实验的结果与其他团队的优化SM4方案进行对比,包括传统工作模式下利用CPU、GPU优化的方案和利用软件快速实现的方案,对比结果显示即便之前团队的方案运行的平台硬件条件好于本文实验环境,文中提出的方案运行速率依然能做到大幅领先.因此,本文方案在安全性、运算速率提高的同时适用平台也更加广泛,在实际生活中针对大数据和个人数据的安全保护中必将发挥巨大的作用.Data transmission security has become increasingly prominent with the rapid development of big data,cloud computing,and 5G communication technologies.The design and efficient implementation of cryptographic algorithms have become particularly important.Meanwhile,the hardware GPU,which was initially used for image computing,has become a universal and popular computing power resource since the release of the programming model CUDA.Based on a general computer platform,this paper proposes a parallel implementation and optimization scheme for the local GPU to perform high-speed encryption and decryption of SM4 algorithm in CTR mode.Experiments show that the SM4-CTR parallel encryption and decryption scheme proposed in this paper can effectively improve the operating efficiency of the SM4 algorithm.On a general computer platform,it can achieve40 times the speedup,and the encryption and decryption rate has reached 14.192 Gbps.Our experiments prove the effect of thread block division on the GPU parallel acceleration.The optimal thread block size is 128 to 512,and must be an integral multiple of 32.Finally,based on the results of our experiments,and comparing the optimized SM4 solutions of other teams,including the solutions optimized by CPU and GPU in the traditional working mode and the solutions quickly implemented by software.The comparison results show that even other team’s solution runs on a better platform,the operating speed of the scheme proposed in this paper can still achieve a significant lead.Therefore,the solution in this paper has a broader application platform while improving security and computing speed.

关 键 词:SM4算法 CTR模式 CUDA GPU加速 并行算法 通用计算机平台 

分 类 号:TP309.7[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象