检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:田森 庄耀宇 张俊杰[1] 杨丹[1] Tian Sen;Zhuang Yaoyu;Zhang Junjie;Yang Dan(Key Laboratory of Specialty Fiber and Optics Access Networks,Shanghai University,Shanghai 200444,China)
机构地区:[1]上海大学特种光纤与光接入网重点实验室,上海200444
出 处:《电子测量技术》2020年第21期51-57,共7页Electronic Measurement Technology
摘 要:排序作为基本的计算问题,广泛应用于多种场景,如数据库、机器学习等,传统通用处理器实现的排序算法受限于cache与内存速度差异,性能提升有限,越来越多场景采用FPGA进行硬件加速。随着机器学习、人工智能等新兴技术的出现,需要处理的数据量呈指数增长,基于FPGA实现大数据量排序的排序器通常为合并排序器,合并排序器能够以迭代的方式实现大数据量排序,而现有基于FPGA实现的合并排序器不支持变长合并排序,在实际应用中仍有巨大挑战。针对现有的合并排序器的问题,提出了一种基于FPGA的高性能变长合并排序加速器,该排序结构通过对基本合并树添加控制逻辑的方式实现变长合并排序。为了使得该排序结构能够实现任意长度的大数据量排序,提出了一种新颖的数据存储结构以及读取控制方法,实现了变长合并树输入队列有序。为了验证提出的加速器的正确性以及评估加速器性能,在开发板KCU1500上实现了该结构,当排序70M个双精度浮点类型数据时,相比软件排序,提出的排序加速器性能是软件排序的6倍。Sorting is a basic computing problem and is widely used in a variety of scenarios such as databases,machine learning,etc.The sorting algorithm implemented by traditional general-purpose processors is limited by the difference in cache and memory speeds,and performance is limited.Thus more and more scenarios use FPGA to accelerate sorting algorithm.With the emergence of emerging technologies such as machine learning and artificial intelligence,the amount of data that needs to be processed has increased exponentially.The sorter based on FPGA to achieve large amounts of data sorting is usually a merge sorter,which can achieve large amounts of data in an iterative way.However,the existing FPGA-based merge sorters don’t support variable-length merge sort.Aiming at the problems of the existing merge sorter,proposes a high-performance variable-length merge sort accelerator based on FPGA.The sorting structure implements variable-length merge sorting by adding control logic to the basic merge tree.In order to make the sorting structure can achieve the sorting of large amount of data of any length,proposes a novel data storage structure and read control method to achieve the ordering of the variable-length merge tree input queue.In order to verify the correctness of the proposed accelerator and evaluate the performance of the accelerator,this paper implements the structure on the evaluation kit KCU1500.When sorting 70 M records(double-precision floating-point),the performance of the proposed sorting accelerator is 6 times than the software sorting performance.
分 类 号:TN802[电子电信—信息与通信工程] TP331[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.12.153.221