SM4算法快速软件实现  被引量:23

Fast Software Implementation of SM4

在线阅读下载全文

作  者:张笑从 郭华[1,2] 张习勇 王闯 刘建伟 ZHANG Xiao-Cong;GUO Hua;ZHANG Xi-Yong;WANG Chuang;LIU Jian-Wei(State Key Laboratory of Software Development Environment,Beihang University,Beijing 100191,China;State Key Laboratory of Cryptology,Beijing 100878,China;Key Laboratory of Aerospace Network Security(Ministry of Industry and Information Technology),Beihang University,Beijing 100191,China;Beijing Institute of Satellite Information Engineering,Beijing 100086,China)

机构地区:[1]北京航空航天大学软件开发环境国家重点实验室,北京100191 [2]密码科学技术国家重点实验室,北京100878 [3]北京航空航天大学空天网络安全工业与信息化部重点实验室,北京100191 [4]北京卫星信息工程研究所,北京100086

出  处:《密码学报》2020年第6期799-811,共13页Journal of Cryptologic Research

基  金:北京市自然科学基金(4202037);CCF-腾讯科研基金(CCF-Tencent RAGR20200123);国家重点研发计划(2017YFB1400700);科学研究与研究生培养共建项目(JD100060630);国家级大学生创新创业训练计划(201910006159,201910006107)。

摘  要:SM4是对称分组密码国家标准.加解密计算效率是衡量算法实现性能的重要指标,而目前关于SM4软件实现方法方面的研究不多.利用比特切片技术,结合支持单指令多数据(SIMD)的AVX2指令集,本文提出了一种SM4算法的快速软件优化实现方法,使用256位的YMM寄存器实现了SM4算法的256分组数据并行加解密.首先基于已有的选择函数构造了新的选择函数,之后改进了搜索算法,基于新的选择函数和改进的搜索算法化简了S盒的逻辑表达式,将实现逻辑表达式所需的逻辑门电路数量由3000(最简与或式)降至497.在Intel Core i7-7700HQ(Kabylake)@2.80 GHz处理器上,实现速度达到了2580 Mbps,同公开文献中的最好结果1795 Mbps(Intel Core i7-5500U(Broadwell-U)@2.40 GHz)相比,实现效率提高了43%.基于比特切片技术的软件实现优化方法无需内存或高速缓存查表,因此该方法可抵抗缓存-计时侧信道攻击,从而安全性得到了提升.本文提出的优化方法具有可扩展性,不仅适用于在X86平台上借助拓展指令集AVX2实现,还可利用RISC指令集在资源受限,安全性要求高的ARM等嵌入式平台上实现.此外,新的选择函数和搜索算法具有通用性,可用于其它一般逻辑函数的化简.The SM4 algorithm is China’s national standard of symmetric block cipher,and its efficiency is one of the most important features.So far,insufficient work has been done on fast software implementation of SM4 algorithm.Exploiting bit-slicing technique and SIMD(single instruction multiple data)instruction set AVX2,this paper presents a fast implementation of SM4 algorithm which can process 256 blocks in parallel via 256 bits YMM registers.Firstly,a new selection function is constructed based on existing ones.Then,the logic circuit generating algorithm corresponding to the selection function is improved.Furthermore,the number of gates of the S box is reduced from 3000 to 497.Using an Intel Core i7-7700HQ(Kabylake)@2.80 GHz processor,the software performance is 2580 Mbps,43%ahead of SM4’s benchmark on software implementation which is 1795 Mbps(Intel Core i7-5500U(Broadwell-U)@2.40 GHz).Bit-sliced implementation does not require to store a table in memory or in cache,hence it is immune to side channel attacks such as cache attack and timing attack.The improved method presented in this paper can be implemented on various computing platforms,which means that it is suitable to X86 architecture with extended instruction set AVX2,and is also suitable to embedded systems with RISC instructions and limited resource.Note that the improved selection function and the improved logic circuit generating algorithm are a generic approach,which can be used to the reduction of general logical functions.

关 键 词:SM4算法 软件优化实现 比特切片 SIMD技术 

分 类 号:TP309.7[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象