基于动态语义记忆网络的长尾图像描述生成被引量：1

Long-tail image captioning with dynamic semantic memory network

作　　者：刘昊杨小汕[1] 徐常胜[1] LIU Hao;YANG Xiaoshan;XU Changsheng(National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)

机构地区：[1]中国科学院自动化研究所模式识别国家重点实验室,北京100190

出　　处：《北京航空航天大学学报》2022年第8期1399-1408,共10页Journal of Beijing University of Aeronautics and Astronautics

基　　金：国家重点研发计划(2018AAA0100604);国家自然科学基金(61720106006,62036012,62072455,61721004,U1836220,U1705262);中国科学院前沿科学重点研究计划(QYZDJ-SSW-JSC039);北京市自然科学基金(L2010011)。

摘　　要：图像描述生成任务旨在基于输入图像生成对应的自然语言描述。现有任务数据集中大部分图像的描述语句通常包含少量常见词和大量罕见词,呈现出长尾分布。已有研究专注于提升模型在整个数据集上的描述语句准确性,忽视了对大量罕见词的准确描述,限制了在实际场景中的应用。针对这一问题,提出了基于动态语义记忆网络(DSMN)的长尾图像描述生成模型,旨在保证模型对常见名词准确描述的同时,提升模型对罕见名词的描述效果。DSMN模型能够动态挖掘罕见词与常见词的全局语义关系,实现从常见词到罕见词的语义知识迁移,通过协同考虑全局单词语义关系信息及当前输入图像和已生成单词的局部语义信息提升罕见词的语义特征表示能力和预测性能。为了有效评价长尾图像描述生成方法,基于MS COCO Captioning数据集定义了长尾图像描述生成任务专用测试集Few-COCO。在MS COCO Captioning和Few-COCO数据集上的多个量化实验表明,DSMN模型在Few-COCO数据集上的罕见词描述准确率为0.6028%,召回率为0.3234%,F-1值为0.3567%,相较于基准方法提升明显。Image captioning takes image as input and outputs a text sequence.Nowadays,most images included in image captioning datasets are captured from daily life of internet users.Captions of these images are consequently composed of a few common words and many rare words.Most existing studies focus on improving performance of captioning in the whole dataset,regardless of captioning performance among rare words.To solve this problem,we introduce long-tail image captioning with dynamic semantic memory network(DSMN).Long-tail image captioning requires model improving performance of rare words generation,while maintaining good performance of common words generation.DSMN model dynamically mining the global semantic relationship between rare words and common words,enabling knowledge transfer from common words to rare words.Result shows DSMN improves performance of semantic representation of rare words by collaborating global words semantic relation and local semantic information of the input picture and generated words.For better evaluation on long-tail image captioning,we organized a task-specified test split Few-COCO from original MS COCO Captioning dataset.By conducting quantitative and qualitative experiments,the rare words description precision of DSMN model on Few-COCO dataset is 0.6028%,the recall is 0.3234%,and the F-1 value is 0.3567%,showing significant improvement compared with baseline methods.

关键词：深度学习图像理解图像描述生成长尾分布记忆网络

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于动态语义记忆网络的长尾图像描述生成被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于动态语义记忆网络的长尾图像描述生成 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于动态语义记忆网络的长尾图像描述生成被引量：1