基于双注意模型的图像描述生成方法研究  被引量:6

Research on Image Captioning Based on Double Attention Model

在线阅读下载全文

作  者:卓亚琦[1] 魏家辉 李志欣[2] ZHUO Ya-qi;WEI Jia-hui;LI Zhi-xin(College of Science,Guilin University of Technology,Guilin,Guangxi 541004,China;Guangxi Key Lab of Multi-source Information Mining and Security,Guangxi Normal University,Guilin,Guangxi 541004,China)

机构地区:[1]桂林理工大学理学院,广西桂林541004 [2]广西师范大学广西多源信息挖掘与安全重点实验室,广西桂林541004

出  处:《电子学报》2022年第5期1123-1130,共8页Acta Electronica Sinica

基  金:国家自然科学基金(No.61966004,No.61866004);广西自然科学基金(No.2019GXNSFDA245018);广西研究生教育创新计划(No.XY-CBZ2021002)。

摘  要:现有图像描述生成方法的注意模型通常采用单词级注意,从图像中提取局部特征作为生成当前单词的视觉信息输入,缺乏准确的图像全局信息指导.针对这个问题,提出基于语句级注意的图像描述生成方法,通过自注意机制从图像中提取语句级的注意信息,来表示生成语句所需的图像全局信息.在此基础上,结合语句级注意和单词级注意进一步提出了双注意模型,以此来生成更准确的图像描述.通过在模型的中间阶段实施监督和优化,以解决信息间的干扰问题.此外,将强化学习应用于两阶段的训练来优化模型的评估度量.通过在MSCOCO和Flickr30K两个基准数据集上的实验评估,结果表明本文提出的方法能够生成更加准确和丰富的描述语句,并且在各项评价指标上优于现有的多种基于注意机制的方法.The attention model of existing image captioning approaches usually adopt word-level attention,which extracts local features from images.The local features are used as the visual information input to generate the current word,lacking accurate image global information guidance.To solve this problem,this paper proposed image captioning approach based on sentence-level attention.The approach employs the self-attention mechanism to extract the sentence-level attention information from the image,which is used to represent the global image information needed to generate sentences.On this basis,we further proposes a double attention model which combines sentence-level attention with word-level attention to generate more accurate description.We implement supervision and optimization in the intermediate stage of the model to solve the problem of information interference.In addition,reinforcement learning is applied in two-stage training to optimize the evaluation metric of the model.Finally,we evaluated our approach on two baseline datasets,i.e.MSCOCO and Flickr30K.Experimental results show that the proposed approach can generate more accurate and richer captions.Hence it outperforms many state-of-the-art image captioning approaches based on attention mechanism in various evaluation metrics.

关 键 词:图像描述生成 编码器-解码器架构 单词级注意 语句级注意 双注意模型 强化学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象