基于图注意力网络的全局图像描述生成方法  被引量:1

Global image captioning method based on graph attention network

在线阅读下载全文

作  者:隋佳宏 毛莺池[1,2] 于慧敏 王子成 平萍[1,2] SUI Jiahong;MAO Yingchi;YU Huimin;WANG Zicheng;PING Ping(College of Computer and Information,Hohai University,Nanjing Jiangsu 210098,China;Key Laboratory of Water Big Data Technology of Ministry of Water Resources(Hohai University),Nanjing Jiangsu 210098,China;Power China Kunming Engineering Corporation Limited,Kunming Yunnan 650051,China)

机构地区:[1]河海大学计算机与信息学院,南京210098 [2]水利部水利大数据重点实验室(河海大学),南京210098 [3]中国电建集团昆明勘测设计研究院有限公司,昆明650051

出  处:《计算机应用》2023年第5期1409-1415,共7页journal of Computer Applications

基  金:国家自然科学基金资助项目(61902110);江苏省重点研发计划项目(BE2020729);华能集团总部科技项目(HNKJ19‑H12,HNKJ20‑H46)。

摘  要:现有图像描述生成方法仅考虑网格的空间位置特征,网格特征交互不足,并且未充分利用图像的全局特征。为生成更高质量的图像描述,提出一种基于图注意力网络(GAT)的全局图像描述生成方法。首先,利用多层卷积神经网络(CNN)进行视觉编码,提取给定图像的网格特征和整幅图像特征,并构建网格特征交互图;然后,通过GAT将特征提取问题转化成节点分类问题,包括一个全局节点和多个局部节点,更新优化后可以充分利用全局和局部特征;最后,基于Transformer的解码模块利用改进的视觉特征生成图像描述。在Microsoft COCO数据集上的实验结果表明,所提方法能有效捕捉图像的全局和局部特征,在CIDEr(Consensus-based Image Description Evaluation)指标上达到了133.1%。可见基于GAT的全局图像描述生成方法能有效提高文字描述图像的准确度,从而可以使用文字对图像进行分类、检索、分析等处理。The existing image captioning methods only focus on the grid spatial location features without enough grid feature interaction and full use of image global features.To generate higher-quality image captions,a global image captioning method based on Graph ATtention network(GAT)was proposed.Firstly,a multi-layer Convolutional Neural Network(CNN)was utilized for visual encoding,extracting the grid features and entire image features of the given image and building a grid feature interaction graph.Then,by using GAT,the feature extraction problem was transformed into a node classification problem,including a global node and many local nodes,and the global and local features were able to be fully utilized after updating the optimization.Finally,through the Transformer-based decoding module,the improved visual features were adopted to realize image captioning.Experimental results on the Microsoft COCO dataset demonstrated that the proposed method effectively captured the global and local features of the image,achieving 133.1%in CIDEr(Consensusbased Image Description Evaluation)metric.It can be seen that the proposed image captioning method is effective in improving the accuracy of image captioning,thus allowing processing tasks such as classification,retrieval,and analysis of images by words.

关 键 词:网格特征 图注意力网络 卷积神经网络 图像描述生成 全局特征 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程] TP391.1[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象