检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:廖晓群 王佳仪 苏涛[2] 李敏 张美春 LIAO Xiao-qun;WANG Jia-yi;SU Tao;LI Min;ZHANG Mei-chun(School of Communication and Information Engineering,Xi’an University of Science and Technology,Xi’an 710054,China;National Lab of Radar Signal Processing,Xidian University,Xi’an 710071,China)
机构地区:[1]西安科技大学通信与信息工程学院,陕西西安710054 [2]西安电子科技大学雷达信号处理国家重点实验室,陕西西安710071
出 处:《计算机技术与发展》2021年第11期101-107,共7页Computer Technology and Development
基 金:国家科技重大专项项目(2012ZX01034001-001)。
摘 要:目前HXDSP1042编译器的编程模型已经可以支持以字节为单位的寻址模式以及64位数据的存取与运算,这对于提高浮点数据运算的精度具有重要的意义。矩阵类算法是雷达信号处理的常用运算,在自适应波束形成、方向估计中矩阵运算占有相当大的比重,现在很多DSP处理器并不能自动地充分利用自身所拥有的硬件架构,如何让编译器高效地处理矩阵类的运算变得尤为重要。HXDSP1042是一款针对数字信号处理及嵌入式应用的处理器,如何在HXDSP1042指令框架下,针对该芯片的硬件特点展开矩阵类运算的设计,是芯片走向高性能应用的重要一步。文中结合多簇VLIW指令架构的特点,基于循环展开、指令调度以及软件流水等并行优化技术,充分利用芯片内部硬件资源,对HXDSP1042芯片中的双精度浮点矩阵乘以向量运算函数实施并行优化。实验结果表明,相对于优化前的串行算法结构来说,并行优化后的函数加速比达到了11以上。At present,the programming model of the HXDSP1042 compiler can support the addressing mode in bytes and the access and operation of 64-bit data,which is of great significance for improving the accuracy of floating-point data operations.Matrix algorithms are common operations in radar signal processing,and matrix operations occupy a large proportion in adaptive beamforming and direction estimation.Now many DSP processors cannot automatically make full use of their own hardware architecture.How to make the compiler handle matrix operations efficiently becomes particularly important.HXDSP1042 is a processor for digital signal processing and embedded applications.How to design matrix operations based on the hardware characteristics of the chip under the HXDSP1042 instruction framework is an important step towards high-performance applications for the chip.In this paper,combining the characteristics of the multi-cluster VLIW instruction architecture,based on parallel optimization techniques such as loop unrolling,instruction scheduling,and software pipeline,making full use of the internal hardware resources of the chip,the double-precision floating-point matrix multiplying the vector operation function in the HXDSP1042 chip is implemented in parallel optimization.The experiment shows that compared with the serial algorithm structure before optimization,the function speedup ratio after parallel optimization reaches 11 or more.
关 键 词:多簇 单指令流多数据流 64位数据运算 软件流水 数字信号处理器
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.140.254.100