检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李倍 闵丰 杨军[1] 梁科[1] 李国峰[1] LI Bei;MIN Feng;YANG Jun;LIANG Ke;LI Guofeng(Tianjin Key Laboratory of Optoelectronic Sensor and Sensing Network Technology,Integrated Circuit and System Integration Laboratory of Nankai University,Tianjin 300350,China;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]南开大学IC设计与系统集成实验室,天津市光电传感器与传感网络技术重点实验室,天津300350 [2]中国科学院计算技术研究所,北京100190
出 处:《微电子学与计算机》2021年第8期53-58,共6页Microelectronics & Computer
基 金:国家自然科学基金项目(62004198);北京市自然科学基金资助项目(4194092);国家重点研发计划(2018AAA0102505)。
摘 要:针对当前神经网络加速器难以高效实现目标跟踪边框后处理的问题,提出一种高效的目标跟踪专用加速器.引入神经网络架构,用于提取输入视图特征并生成边框置信度与偏移量集合.随后针对目标跟踪的边框处理设计了专用于边框的回归、惩罚以及提取操作的加速模块,通过同步神经网络加速器与专用加速模块间的数据,以流水结构并行执行特征提取与边框操作,实现基于深度学习目标跟踪的端到端处理.该加速器在40 nm工艺下消耗面积3.64mm^(2),获得了5.71 Tops/W能效比.实验结果表明:与现有加速方案相比,该目标跟踪加速器获得了1.53倍加速,可实现实时的视频处理(31 fps).其中仅针对跟踪过程的后处理任务,专用加速模块相对RISC处理器可实现3.2倍的加速比.Since the current nerual network accelerator couldn t efficiently accelerate the post-processing of object tracking»a dedicated object trackeris proposed.A neural network architecture is introduced to extract the features of the input feature map.At the meanwhile,it generates thebounding box confidence and position offset sets.Adedicated acceleration module is designed for the anchor regression,penalty calculation and extraction.By synchronizing the data between the neural network accelerator and the dedicated module,a new pipelined structure is proposed to execute the feature extraction and anchor regression in parallel.Therefore,the end-to-end processing of the object tracking is efficiently achieved.The accelerator consumes an area of 3.64 mm^(2)under the SMIC 40nm process,and achieves 5.71 Tops/W energy efficiency.Experimental results show that,compared with the current accleration solutions,the object tracking accelerator achieves 1.53 times acceleration,and it could realize real-time video processing(31 fps).For the post-processing task of the tracking,the processing speeds of the proposed dedicated module is improved by 3.2 times than the RISC processor.
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15