一种基于FPGA加速的高性能数据解压方法  被引量:5

An FPGA-Accelerated High-Performance Data Decompression Method

在线阅读下载全文

作  者:刘谱光 魏子令 黄成龙 陈曙晖[1] LIU Pu-Guang;WEI Zi-Ling;HUANG Cheng-Long;CHEN Shu-Hui(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;Artificial Intelligence Innovation Center,National Innovation Institute of Defense Technology,Academy of Military Sciences,Beijing 100166)

机构地区:[1]国防科技大学计算机学院,长沙410073 [2]军事科学院国防科技创新研究院人工智能研究中心,北京100166

出  处:《计算机学报》2023年第12期2687-2704,共18页Chinese Journal of Computers

基  金:国家自然科学基金(62202486,61972412,U22B2005,12102468);国防科技大学校科研项目(ZK21-02)资助。

摘  要:在数据库、深度学习、高效存储等数据读取性能敏感的应用场景中,数据解压性能对上层应用的服务质量有着重要影响.LZ4无损数据压缩算法具备高速解压特性,因此被广泛应用在高速解压场景中,但其运行需要消耗大量CPU资源.为减少LZ4数据解压开销,学界和业界提出了基于FPGA的LZ4数据解压加速方法.但现有方法大多采用逐字节顺序处理的计算模式,导致并行度和吞吐率存在较大不足.因此,设计实现高性能LZ4数据解压加速方法成为当前研究亟需解决的关键问题.以LZ4解压的高性能加速为目标,本文研究从多层次对LZ4解压进行并行加速设计,提出了一种基于FPGA加速的高性能LZ4数据解压方法.首先,本方法研究对LZ4序列解析过程进行并行化改进,设计实现了一个基于多字段并行解析方法的并行化序列解析器,将吞吐率从每周期单字节扩展到每周期多字节.此外,本方法对序列解析器中的高时延长度字段解析逻辑进行优化改进,设计了基于二分法的最大匹配长度快速解析方法,显著减小序列解析器的关键路径时延,使得改进后的设计时钟频率比改进前提高了约21%.其次,基于并行化序列解析器,本方法设计实现了一个高性能数据解压引擎.该引擎将序列解析与数据还原过程进行解耦设计,对解压输出数据通路进行扩展,解决了解压过程中输入输出吞吐率不匹配的问题.最后,为进一步提高吞吐率性能,本方法提出了可扩展多引擎数据解压加速器设计,并实现了一个基于CPU-FPGA架构的异构端到端数据解压加速系统原型.实验分析表明,本方法提出的数据解压引擎的每周期吞吐量是现有研究的4.1~6.8倍.该引擎实现了约1.7 GB/s的解压吞吐率,达到现有研究的2.6~6.6倍.系统原型的端到端测试和资源使用评估结果表明,本方法提出的数据解压加速系统在吞吐率和资源使用方面具备良好的可扩�In scenarios where data read performance is crucial,such as databases,deep learning,and efficient storage,the performance of data decompression greatly affects the quality of service for higher-level applications.The LZ4 lossless data compression algorithm is known for its highspeed decompression characteristic,making it a popular choice in scenarios that require rapid decompression.However,LZ4 compression places a significant burden on CPU resources.To mitigate the overhead of LZ4 data decompression,academia and industry have proposed FPGAbased acceleration methods for LZ4 decompression.However,the majority of existing methods utilize a byte-by-byte sequential processing approach,which severely limits parallelism and throughput.Consequently,developing high-performance methods for accelerating LZ4 data decompression is a critical challenge in current research.Our research aims to achieve highperformance LZ4 decompression acceleration.We investigate the parallel acceleration of LZ4 decompression at multiple processing levels and propose an FPGA-accelerated method for highperformance LZ4 data decompression.Initially,we enhance the parallelization of the LZ4 sequence parsing process by developing a multi-field parallel parser that enables processing multiple bytes per cycle instead of one byte per cycle.Moreover,we optimize the long-delay logic for parsing the length field in the sequence parser.We introduce a dichotomous-based parsing method for rapidly determining the maximum match length.This approach substantially decreases the critical path delay of the sequence parser and enhances the design clock frequency by approximately 21%.Subsequently,we develop a high-performance data decompression engine based on the parallel sequence parser.This engine separates the sequence parsing and data recovery processes and expands the decompression output data path to resolve the input-output throughput mismatch during decompression.Finally,to enhance the throughput performance,we introduce a scalable data decompression accele

关 键 词:数据解压加速 并行化设计 现场可编程门阵列(FPGA) LZ4算法 

分 类 号:TP302[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象