基于场景中物体位置关系的图像描述方法

Image description method based on object position relationship in scene

作　　者：杨璐钱艺文益民 YANG Lu;QIAN Yi;WEN Yimin(School of Computer and Information Security,Guilin University of Electronic Technology,Guilin 541004,China)

机构地区：[1]桂林电子科技大学计算机与信息安全学院,广西桂林541004

出　　处：《桂林电子科技大学学报》2024年第6期560-567,共8页Journal of Guilin University of Electronic Technology

基　　金：广西重点研发计划(桂科AB21220023);国家自然科学基金(61866007);广西图像图形与智能处理重点实验室基金(GIIP2005)。

摘　　要：图像描述旨在将图像内容转化为语言表述,是一个亟待解决且具有挑战性的多模态生成任务。然而,现有方法缺少对图像中隐含位置信息的关注,导致物体位置关系难以得到准确描述。为解决该问题,提出一种基于场景中物体位置关系的图像描述方法。首先,使用图节点特征构建物体关系场景图,随后利用位置关系编码器对节点特征进行初次编码。其次,提出常识词典与推理模块,计算物体间比例失衡程度,根据该程度值对物体关系节点进行二次编码。再次,设计联合解码器对已编码信息进行处理,通过擦除模块和偏置门控机制进一步优化图中的节点特征。最后,生成该图像对应的文字描述。提出的方法在2个公开数据集MSCOCO、Visual Genome上进行实验验证,在各项评价指标上比现有方法均有提升,并在CIDEr指标上取得显著效果。该方法源码可在https://gitee.com/ymw12345/PRCO获取。Image description aims to transform visual content into language description,which is an urgent and challenging multimodal generation task.Due to the lack of attention to the implicit position information in the most image description methods,it is difficult to accurately describe the position relationship of the objects in the image.For solving this problem,the position relationship encoder-combine decoder(PRCO)structure is proposed,which focus on and generate the objects positional relationships.A novel position relationship-encoder get started with the object relationship scene graph using node features.Technically,common sense dictionary and reasoning module are created to calculate the degree of imbalance between objects,which are used to perform a secondary encoding of the object relationship nodes.Specifically,the combine-decoder is designed to process the encoded information,with an erasing module and bias gate to optimize the node features in the graph.Experiments are conducted on MSCOCO and Visual Genome Image description dataset,and superior results in comparing to state-of-the-art approaches.More remarkably,PRCO achieves an increases CIDEr performance on Visual Genome testing set.Our code is publicly available on Gitee:https://gitee.com/ymw12345/PRCO.

关键词：图像描述图卷积网络长短期记忆网络位置关系编码器联合解码器

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于场景中物体位置关系的图像描述方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于场景中物体位置关系的图像描述方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索