检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:詹逸梦 扈啸[1] 郭阳[1] ZHAN Yimeng;HU Xiao;GUO Yang(College of Computer,National University of Defense Technology,Changsha 410073,Hunan,China)
机构地区:[1]国防科技大学计算机学院,湖南长沙410073
出 处:《微电子学与计算机》2023年第2期71-78,共8页Microelectronics & Computer
基 金:国家科技重大专项(2017-V-0014-0066)。
摘 要:二维FFT是图像处理的典型算法,广泛应用于图像滤波、快速卷积、目标跟踪等领域.为满足高分辨率图像的实时处理需求,基于自主研制的FT-X众核DSP处理器,提出了一种二维FFT算法的多核并行实现方法.基于众核编程模型,通过多核任务部署、地址空间重映射等方式完成了任务初始化,实现了24核数据并行处理,加速比达到19.8倍.在此基础上,提出了基于DMA跨步传输的隐式转置方案,通过矩阵地址分配的方式,解决了大型矩阵跨步传输步长受限的问题.实验结果表明,在8 K×8 K的数据规模下,相对于直接转置和指令隐式转置分别节省了91%和65%的转置时间,同时识别并解决了某特殊情况下的多核负载不均衡的问题,将各核的用时差距从64%下降到了12%,整体用时下降了26%.Two-dimensional FFT is a typical algorithm of image processing,widely used in image filtering,fast convolution,target tracking and other fields.A parallel implementation method of 2D FFT algorithm based on the selfdeveloped FT-X many-core DSP is proposed,in order to meet the real-time processing requirements of high resolution images.Based on the multi-core programming model,the task initialization is accomplished through multi-core task deployment and address space remapping.The parallel data processing of 24 cores is realized and the speed ratio is 19.8 times.An implicit transpose based on DMA step transfer is proposed,which uses matrix address allocation to solve the problem of limited step size in large matrix step transfer.Experimental results show that compared with direct transpose and instruction implicit transpose,the transpose time is saved 91%and 65%respectively at 8Kx8K data scale.At the same time,the problem of unbalanced multi-core load in a special case is identified and solved.The difference between cores fell from 64%to 12%,and overall time fell 26%.
关 键 词:二维FFT 多核并行 转置 DMA跨步传输 负载均衡
分 类 号:TN911.73[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7