基于多注意力多尺度特征融合的图像描述生成算法被引量：17

Image caption genaration algorithm based on multi-attention and multi-scale feature fusion

作　　者：陈龙杰张钰[1,2,3,4] 张玉梅[1,2,3,4] 吴晓军 CHEN Longjie;ZHANG Yu;ZHANG Yumei;WU Xiaojun(Key Laboratory of Modern Teaching Technology,Ministry of Education(Shaanxi Normal University),Xi'an Shaanxi 710062,China;Engineering Laboratory of Teaching Information Technology of Shaanxi Province(Shaanxi Normal University),Xi'an Shaanxi 710119,China;Culture,Education and Intelligent Communication Engineering Technology Research Center(Shaanxi Normal University),Xi'an Shaanxi 710119,China;School of Computer Science,Shaanxi Normal University,Xi'an Shaanxi 710119,China)

机构地区：[1]现代教学技术教育部重点实验室(陕西师范大学),西安710062 [2]陕西省教学信息技术工程实验室(陕西师范大学),西安710119 [3]文化教育智慧传播工程技术研究中心(陕西师范大学),西安710119 [4]陕西师范大学计算机科学学院,西安710119

出　　处：《计算机应用》2019年第2期354-359,共6页journal of Computer Applications

基　　金：国家自然科学基金资助项目(11772178;61741208;11502133);中央高校基本科研业务费资助项目(GK201801004;GK201803089;GK201703082);陕西省自然科学基金资助项目(2017JQ6074);国家重点研发计划项目(2017YFB1402102);陕西省自然科学基础研究计划项目(2017JM6103;2017JM6060);陕西师范大学2017年度校级综合教改研究项目(17JG33)~~

摘　　要：针对图像描述生成中对图像细节表述质量不高、图像特征利用不充分、循环神经网络层次单一等问题,提出基于多注意力、多尺度特征融合的图像描述生成算法。该算法使用经过预训练的目标检测网络来提取图像在卷积神经网络不同层上的特征,将图像特征分层输入多注意力结构中,依次将多注意力结构与多层循环神经网络相连,构造出多层次的图像描述生成网络模型。在多层循环神经网络中加入残差连接来提高网络性能,并且可以有效避免因为网络加深导致的网络退化问题。在MSCOCO测试集中,所提算法的BLEU-1和CIDEr得分分别可以达到0. 804及1. 167,明显优于基于单一注意力结构的自上而下图像描述生成算法;通过人工观察对比可知,所提算法生成的图像描述可以表现出更好的图像细节。Focusing on the issues of low quality of image caption,insufficient utilization of image features and single-level structure of recurrent neural network in image caption generation,an image caption generation algorithm based on multiattention and multi-scale feature fusion was proposed.The pre-trained target detection network was used to extract the features of the image from the convolutional neural network,which were input into the multi-attention structures at different layers.Each attention part with features of different levels was related to the multi-level recurrent neural networks sequentially,constructing a multi-level image caption generation network model.By introducing residual connections in the recurrent networks,the network complexity was reduced and the network degradation caused by deepening network was avoided.In MSCOCO datasets,the BLEU-1 and CIDEr scores of the proposed algorithm can achieve 0.804 and 1.167,which is obviously superior to top-down image caption generation algorithm based on single attention structure.Both artificial observation and comparison results velidate that the image caption generated by the proposed algorithm can show better details.

关键词：长短期记忆网络图像描述多注意力机制多尺度特征融合深度神经网络

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多注意力多尺度特征融合的图像描述生成算法被引量：17

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于多注意力多尺度特征融合的图像描述生成算法 被引量：17

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于多注意力多尺度特征融合的图像描述生成算法被引量：17