检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高汉源 宫磊 王腾 GAO Hanyuan;GONG Lei;WANG Teng(School of Data Science,University of Science and Technology of China,Hefei 230026,China;Suzhou Institute for Advanced Research,University of Science and Technology of China,Suzhou,Jiangsu 215123,China;School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China)
机构地区:[1]中国科学技术大学大数据学院,合肥230026 [2]中国科学技术大学苏州高等研究院,江苏苏州215123 [3]中国科学技术大学计算机科学与技术学院,合肥230026
出 处:《计算机工程与应用》2024年第18期147-157,共11页Computer Engineering and Applications
基 金:国家重点研发计划(2022YFB4501600,2022YFB4501603);国家自然科学基金(62102383,61976200,62172380);江苏省自然科学基金(BK20210123);中国科学院青年创新促进会(Y2021121)。
摘 要:位级可组合架构用于支持有多种数据位宽类型的神经网络计算。其硬件结构有较多变体,面对不同神经网络模型需额外设计程序调度。过程耗时,阻碍软硬件的快速迭代和部署,效果难以评估。相关的数据流建模工作缺乏位级计算描述和自动化方法。提出了基于数据流建模的自适应位级可组合架构上的数据调度优化方法解决上述问题。引入位级数据流建模,以多种循环原语和张量-索引关系矩阵,描述位级可组合硬件结构的特征和应用的数据调度过程。从建模表达中提取数据访问信息,统计数据复用情况,进行快速评估。构建了设计空间探索框架,针对不同应用和硬件设计约束自适应优化数据调度过程。利用索引匹配方法和循环变换方法进行设计采样,添加贪心规则进行剪枝,以提高探索效率。在多个应用程序和多种硬件结构约束下进行实验。结果表明对比先进的手动设计的加速器和数据调度,获得了更好的性能表现。Bit-level composable architecture is used to support neural networks with multiple data precision types.The hardware structures are variable.Besides,different applications require different data schedules.The design process is time-consuming and labor-intensive,hindering the rapid evolvement of software and hardware.The final effects are diffi-cult to evaluate.Related works lack the bit-level consideration and automation.A schedule optimization method for bit-level composable architecture based on dataflow modeling is proposed to solve the problems.Dataflow modeling including different loop statements and a tensor-index matrix is introduced to describe the hardware structure and the scheduling process.Data access information and data reuse amount are quickly evaluated from dataflow representations.Based on the model,a design space exploration method is built to automatically design the schedule for different applications and hardware constraints.Pruning strategies are used to reduce design space and promote exploration efficiency.The experi-mental result shows that under different applications and hardware constraints,the method achieves better performance results compared to other accelerators and schedules.
分 类 号:TP302.1[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.12.164.78