机构地区:[1]中国科学院半导体研究所高速电路与神经网络实验室,北京100083 [2]威富集团形象认知计算联合实验室,北京100083 [3]中国科学院大学集成电路学院,北京100049
出 处:《计算机学报》2022年第10期2080-2092,共13页Chinese Journal of Computers
基 金:国家自然科学基金(61901436)资助.
摘 要:Facebook AI研究者2020年提出的Detection Transformer(DETR)目标检测方法采用简单的编码器-解码器结构,利用集合预测来解决物体检测问题,算法简单、通用、避免了很多手工设计和调参问题,吸引了学术界和产业界的广泛关注.然而,DETR方法对于输入特征的分辨率大小有限制,同时在检测推理过程中缺失相对位置信息,从而导致对小目标和被遮挡目标的检测性能较差.为解决这一问题,受脑认知启发,本文提出基于胶囊推理和残差增强的全推理目标检测网络(Capsule-Inferenced and Residual-Augmented Detection Transformers,CIRA_DETR).首先,建立层间残差信息增强模块,利用大小尺度的差异性对小尺度特征图进行信息增强,在小目标的检测效果上提升了1.8%.接着,为了更贴近人脑的思维方式,更好的建模神经网络中内部知识表示的分层关系,在Transformer的结果进行推理的过程中,引入胶囊推理模块挖掘实体信息,并利用双向注意力路由进行前向信息传递和后向信息的反馈,以此预测图像中目标的类别和位置信息,有效降低了遮挡下的目标检测问题的难度.最后,在目标信息的映射处理中,引入非线性超香肠映射函数,实现了灵活的超曲面构建,有效表达特征和目标类别以及位置之间的映射关系.在COCO数据集上的测试结果验证了CIRA_DETR模型的有效性,其在小目标、中目标和大目标的检测上,平均预测准确率分别达到了25.8%、48.7%和62.7%.本文小目标的检测性能可以和Faster-RCNN相媲美,同时可视化的结果以及性能指标也反映了,相比传统的DETR模型,本文CIRA_DETR模型在被遮挡目标检测上的优势.In recent years,the deep learning has been applied in many image processing tasks.Deep learning shows an excellent performance and encourages many researchers to apply it to object recognition,including various popular directions:improve the direction accuracy by updating the network structure,design a simple network model based on the transformer,and obtain better detection results through the characteristic analysis of the gantry.The DETR object detection model proposed by Facebook AI researchers in 2020 utilize simple encoder-decoder structure,and views object detection as a direct set prediction problem.The DETR model is simple,general,and can avoid many manual designs and tuning problems,attracting widespread attention of the academia and industry.However,due to the limitation of DETR model on the size of input feature map,too small size will lead to insufficient object information.Although the performance of the model has been improved to a certain extent,its detection effect on small targets and occluded targets is not ideal.In the detection of small targets and occluded targets,the entity information corresponding to features and the relative position information between entities are very key to target reasoning.However,in DETR model,feedforward neural network FFN only realizes target information reasoning through weighted summation,and does not consider the interactive information between features,which has become the main factor affecting the detection effect.In contrast,humans can easily detect small targets and occluded targets.In order to solve these problems,we inspire by brain cognition,propose a novel full-inference model,called Capsule-inference and residual-augmented DETR(CIRA_DETR)inspired by the brain cognition.Firstly,CIRA_DETR establishes an inter layer residual information enhancement module to enhance the target related information in the small-scale map by calculating the differences between the large and small-scale feature maps,which can improve the convergence speed and detection perfo
关 键 词:目标检测 DETR TRANSFORMER 胶囊网络 脑神经科学 残差网络
分 类 号:TP319[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...