结合注意力机制与特征融合的场景图生成模型被引量：5

Scene Graph Generation Model Combining Attention Mechanism and Feature Fusion

作　　者：黄勇韬严华[1] HUANG Yong-tao;YAN Hua(School of Electronics and Information Engineering,Sichuan University,Chengdu 610065,China)

机构地区：[1]四川大学电子信息学院,成都610065

出　　处：《计算机科学》2020年第6期133-137,共5页Computer Science

基　　金：国家自然科学基金项目(61403265)。

摘　　要：视觉场景理解不仅可以孤立地识别单个物体,还可以得到不同物体之间的相互作用关系。场景图可以获取所有的(主语-谓词-宾语)信息来描述图像内部的对象关系,在场景理解任务中应用广泛。然而,大部分已有的场景图生成模型结构复杂、推理速度慢、准确率低,不能在现实情况下直接使用。因此,在Factorizable Net的基础上提出了一种结合注意力机制与特征融合的场景图生成模型。首先把整个图片分解为若干个子图,每个子图包含多个对象及对象间的关系;然后在物体特征中融合其位置和形状信息,并利用注意力机制实现物体特征和子图特征之间的消息传递;最后根据物体特征和子图特征分别进行物体分类和物体间关系推断。实验结果表明,在多个视觉关系检测数据集上,该模型视觉关系检测的准确率为22.78%~25.41%,场景图生成的准确率为16.39%~22.75%,比Factorizable Net分别提升了1.2%和1.8%;并且利用一块GTX1080Ti显卡可以在0.6 s之内实现对一幅图像的物体和物体间的关系进行检测。实验数据充分说明,采用子图结构明显减少了需要进行关系推断的图像区域数量,利用特征融合方法和基于注意力机制的消息传递机制提升了深度特征的表现能力,可以更快速准确地预测对象及其关系,从而有效解决了传统的场景图生成模型时效性差、准确度低的难题。Understanding a visual scene can not only identify a single object in isolation,but also get the interaction between different objects.Generating scene graph can obtain all the tuples(subject-predicate-object)and describe the object relationships inside an image,which is widely used in image understanding tasks.To solve the problem that the existing scene graph generation models use complicated structures with slow inference speed,a scene graph generation model combining attention mechanism and feature fusion with Factorizable Net structure was proposed.Firstly,a image is decomposed into subgraphs,where each subgraph contains several objects and their relationships.Then,the position and shape information is merged in the object features,and the attention mechanism is used to realize the message transmission between the object features and the subgraph features.Finally,the object classification and the relationship between the objects are inferred according to the object features and the subgraph features.The experimental results show that the accuracy of the visual relationship detection is 22.78%to 25.41%,and the accuracy of the scene graph generation is 16.39%to 22.75%,which is 1.2%and 1.8%higher than Factorizable Net on multiple vi-sual relationship detection datasets.Besides,the proposed model can perform object relationship detection task in 0.6 seconds with a GTX 1080Ti graphics.The results demonstrate that the number of image regions to be inferred is significantly reduced by using the subgraph structure.The feature fusion method and the attention mechanism are used to improve the performance of depth features,so the objects and their relationships can be predicted more quickly and accurately.Therefore,it solves the problem of poor timeliness and low accuracy in the traditional scene graph generation models.

关键词：场景图视觉关系检测注意力机制消息传递特征融合

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合注意力机制与特征融合的场景图生成模型被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合注意力机制与特征融合的场景图生成模型 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

结合注意力机制与特征融合的场景图生成模型被引量：5