体现用户意图和风格的图像描述生成

Image Captioning According to User’s Intention and Style

作　　者：王宇航张灿龙[1] 李志欣[1] 王智文[2] WANG Yuhang;ZHANG Canlong;LI Zhixin;WANG Zhiwen(Guangxi Key Lab of Multi-source Information Mining&Security(Guangxi Normal University),Guilin Guangxi 541004,China;College of Computer Science and Communication Engineering,Guangxi Universityof Science and Technology,Liuzhou Guangxi 545006,China)

机构地区：[1]广西多源信息挖掘与安全重点实验室(广西师范大学),广西桂林541004 [2]广西科技大学计算机科学与通信工程学院,广西柳州545006

出　　处：《广西师范大学学报（自然科学版）》2022年第4期91-103,共13页Journal of Guangxi Normal University:Natural Science Edition

基　　金：国家自然科学基金(61866004,61966004,61962007);广西自然科学基金(2018GXNSFDA281009,2019GXNSFDA245018,2018GXNSFDA294001);广西多源信息挖掘与安全重点实验室系统性研究课题基金(20-A-03-01);广西“八桂学者”创新研究团队。

摘　　要：现有的图像描述模型大多不能根据用户的意图和用语风格生成个性化的描述。针对这一问题,本文提出一种能体现用户意图和风格的个性化图像描述方法。首先,构建一个关于场景中目标、目标属性以及目标间关系的结构图,通过该图来控制用户所希望表达的目标对象、目标属性以及各目标之间的相互关系;然后,在编码器中加入多关系图卷积神经网络对场景的上下文信息进行编码,并利用图流动注意力来控制描述的侧重点;最后,在生成语句时加入用户风格控制模块,即利用关键词搜索生成包含性别、年龄、受教育程度等信息的用户画像,并结合该画像来控制风格生成器,提取对应的风格样式,最终生成体现用户意图和风格的个性化图像描述。在MSCOCO和FlickrStyle数据集上的实验结果表明,所提出的方法能较好地生成个性化和多样性图像描述语句。Most of the image captioning models are individuality-agnostic,which cannot generate an individual description according to the user’s intention and language style.To address the above problem,a personalized image captioning model is established in this paper by using fine-grained scene control graph and the style control factors to represent user’s intention and style of speaking,respectively.Firstly,construct a scene control graph,including the object,object attributes and the relationship objects in the scene,which can control the object,object attributes and the relationship between object.Secondly,a multi-graph convolutional neural network is used to encode the context information of the scene,and graph flow attention is employed to control the focus of the description.Then,add the style control module when generating sentences,that is,use keyword search to generate user profile according to user’s gender,age,education level and other information.Finally,the style generator extracts the corresponding style pattern according to the user profile,and the language decoder outputs a personalized image caption.The experimental results on MSCOCO dataset and FlickrStyle dataset show that the proposed method can generate personalized and diverse image caption sentences.

关键词：图像描述用户画像细粒度场景控制风格控制注意力机制

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

体现用户意图和风格的图像描述生成

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

体现用户意图和风格的图像描述生成

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索