检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冉敬楠 倪伟 陈世宇 RAN Jingnan;NI Wei;CHEN Shiyu(School of Microelectronics,Hefei University of Technology,Hefei 230601,China)
机构地区:[1]合肥工业大学微电子学院,安徽合肥230601
出 处:《合肥工业大学学报(自然科学版)》2024年第9期1159-1169,共11页Journal of Hefei University of Technology:Natural Science
基 金:国家重点研发计划资助项目(2018YFB2202604)。
摘 要:针对自动驾驶决策计算低功耗、低延时、高精度的需求,文章设计一种支持混合精度运算的深度强化学习自动驾驶决策算法的硬件加速器。通过多运算单元重构方式设计乘累加单元(multiply-and-accumulate unit, MAC),支持多种精度模式的计算,提高加速器的灵活性,降低量化模型的部署成本;通过多层次优化数据流,提高复用程度,优化加速器能耗比。在随机潜在演员评论家(stochastic latent actor-critic, SLAC)自动驾驶决策算法上测试该硬件加速器,结果表明:有效算力达到18.3 GOPS,是CPU的10.7倍,GPU的3.3倍;能效比达到2.197 GOPS/W,是CPU的104倍,GPU的28倍。同时提出一种高位数据编码(most significant bit data coding, MSB-DC)方法实现层内混合精度特征图计算,实验结果表明,该方法能以较少的延迟成本有效降低量化所带来的误差。In order to meet the requirements of low power consumption,low delay and high precision of autonomous driving decision calculation,a hardware accelerator for deep reinforcement learning based autonomous driving decision algorithm supporting mixed precision operation was designed.Multiply-and-accumulate unit(MAC)designed by multiple operation units reconstruction can support multiple precision mode calculation,thus improving the flexibility of accelerator and reducing the deployment cost of quantitative model.The multi-level optimization of the data flow improves the reuse degree and optimizes the accelerator energy consumption ratio.The effective computing power of the hardware accelerator for stochastic latent actor-critic(SLAC)based autonomous driving decision algorithm is 18.3 GOPS,which is 10.7 times that of CPU and 3.3 times that of GPU.The energy efficiency ratio is 2.197 GOPS/W,which is 104 times that of CPU and 28 times that of GPU.At the same time,the most significant bit data coding(MSB-DC)method is proposed to realize the calculation of intra-layer mixed precision feature map.Experiments show that this method can effectively reduce the error caused by quantization with less delay cost.
关 键 词:深度强化学习 自动驾驶 混合精度 神经网络量化 硬件加速
分 类 号:TN47[电子电信—微电子学与固体电子学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.49.72