基于知识辅助的图像描述生成

Knowledge-aided Image Captioning

作　　者：李志欣[1] 苏强[1] LI Zhixin;SU Qiang(Guangxi Key Lab of Multi-source Information Mining and Security(Guangxi Normal University),Guilin Guangxi 541004,China)

机构地区：[1]广西多源信息挖掘与安全重点实验室(广西师范大学),广西桂林541004

出　　处：《广西师范大学学报（自然科学版）》2022年第5期418-432,共15页Journal of Guangxi Normal University:Natural Science Edition

基　　金：国家自然科学基金(61966004,61866004);广西自然科学基金(2019GXNSFDA245018);广西“八桂学者”工程专项基金。

摘　　要：为给定图像自动生成符合人类感知的描述语句是人工智能的重要任务之一。大多数现有的基于注意力的方法均探究语句中单词和图像中区域的映射关系,而这种难以预测的匹配方式有时会造成2种模态间不协调的对应,从而降低描述语句的生成质量。针对此问题,本文提出一种文本相关的单词注意力来提高视觉注意力的正确性。这种特殊的单词注意力在模型序列地生成描述语句过程中强调不同单词的重要性,并充分利用训练数据中的内部标注知识来帮助计算视觉注意力。此外,为了揭示图像中不能被机器直接表达出来的隐含信息,将从外部知识图谱中抽取出来的知识注入到编码器—解码器架构中,以生成更新颖自然的图像描述。在MSCOCO和Flickr30k图像描述基准数据集上的实验表明,本方法能够获得良好的性能,并优于许多现有的先进方法。Automatically generating a human-like description for a given image is one of the most important tasks in artificial intelligence.Most of the existing attention-based methods explore the mapping relationships between words in sentence and regions in image.However,the quality of generated captions can be reduced by such unpredictable matching manner which sometimes cause inharmonious alignments.To solve this problem,a new method which uses word attention to improve the correctness of visual attention when generating word-by-word sequential descriptions is proposed.The special word attention emphasizes word importance when focusing on different regions of the input image,and makes full use of the internal annotation knowledge to assist the calculation of visual attention.Furthermore,in order to reveal implied information that cannot be expressed straightforwardly by machines and generate more novel and natural captions,the external knowledge which is extracted from the knowledge graphs is injected to the encoder-decoder framework.Finally,The new method is validated on two available captioning benchmarks i.e.Microsoft COCO dataset and Flickr30k dataset.The experimental results demonstrate that this new approach can achieve a good performance and outperform many of the state-of-the-art approaches.

关键词：图像描述生成内部知识外部知识单词注意力知识图谱强化学习

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于知识辅助的图像描述生成

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于知识辅助的图像描述生成

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索