检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈龙杰 张钰[1,2,3,4] 张玉梅[1,2,3,4] 吴晓军 CHEN Longjie;ZHANG Yu;ZHANG Yumei;WU Xiaojun(Key Laboratory of Modern Teaching Technology,Ministry of Education(Shaanxi Normal University),Xi'an Shaanxi 710062,China;Engineering Laboratory of Teaching Information Technology of Shaanxi Province(Shaanxi Normal University),Xi'an Shaanxi 710119,China;Culture,Education and Intelligent Communication Engineering Technology Research Center(Shaanxi Normal University),Xi'an Shaanxi 710119,China;School of Computer Science,Shaanxi Normal University,Xi'an Shaanxi 710119,China)
机构地区:[1]现代教学技术教育部重点实验室(陕西师范大学),西安710062 [2]陕西省教学信息技术工程实验室(陕西师范大学),西安710119 [3]文化教育智慧传播工程技术研究中心(陕西师范大学),西安710119 [4]陕西师范大学计算机科学学院,西安710119
出 处:《计算机应用》2019年第2期354-359,共6页journal of Computer Applications
基 金:国家自然科学基金资助项目(11772178;61741208;11502133);中央高校基本科研业务费资助项目(GK201801004;GK201803089;GK201703082);陕西省自然科学基金资助项目(2017JQ6074);国家重点研发计划项目(2017YFB1402102);陕西省自然科学基础研究计划项目(2017JM6103;2017JM6060);陕西师范大学2017年度校级综合教改研究项目(17JG33)~~
摘 要:针对图像描述生成中对图像细节表述质量不高、图像特征利用不充分、循环神经网络层次单一等问题,提出基于多注意力、多尺度特征融合的图像描述生成算法。该算法使用经过预训练的目标检测网络来提取图像在卷积神经网络不同层上的特征,将图像特征分层输入多注意力结构中,依次将多注意力结构与多层循环神经网络相连,构造出多层次的图像描述生成网络模型。在多层循环神经网络中加入残差连接来提高网络性能,并且可以有效避免因为网络加深导致的网络退化问题。在MSCOCO测试集中,所提算法的BLEU-1和CIDEr得分分别可以达到0. 804及1. 167,明显优于基于单一注意力结构的自上而下图像描述生成算法;通过人工观察对比可知,所提算法生成的图像描述可以表现出更好的图像细节。Focusing on the issues of low quality of image caption,insufficient utilization of image features and single-level structure of recurrent neural network in image caption generation,an image caption generation algorithm based on multiattention and multi-scale feature fusion was proposed.The pre-trained target detection network was used to extract the features of the image from the convolutional neural network,which were input into the multi-attention structures at different layers.Each attention part with features of different levels was related to the multi-level recurrent neural networks sequentially,constructing a multi-level image caption generation network model.By introducing residual connections in the recurrent networks,the network complexity was reduced and the network degradation caused by deepening network was avoided.In MSCOCO datasets,the BLEU-1 and CIDEr scores of the proposed algorithm can achieve 0.804 and 1.167,which is obviously superior to top-down image caption generation algorithm based on single attention structure.Both artificial observation and comparison results velidate that the image caption generated by the proposed algorithm can show better details.
关 键 词:长短期记忆网络 图像描述 多注意力机制 多尺度特征融合 深度神经网络
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28