融合编码器和视觉关键词搜索的图像中文描述  

A CHINESE IMAGE CAPTIONING METHOD BASED ON FUSION ENCODER AND VISUAL KEYWORD SEARCH

在线阅读下载全文

作  者:孟繁聪 徐伟 李海波 吴闽 郑竣杰 陈兴 Meng Fancong;Xu Wei;Li Haibo;Wu Min;Zheng Junjie;Chen Xing(East China Yixing Pumped Storage Power Co.,Ltd.,Wuxi 214200,Jiangsu,China;College of Computer and Information,Hohai University,Nanjing 211100,Jiangsu,China)

机构地区:[1]华东宜兴抽水蓄能有限公司,江苏无锡214200 [2]河海大学计算机与信息学院,江苏南京211100

出  处:《计算机应用与软件》2025年第4期208-216,244,共10页Computer Applications and Software

基  金:国网新源公司科技项目(SGXY2000074)。

摘  要:针对当前已有模型缺乏对图像局部细节的关注以及趋向于通用型描述问题,提出一种采用融合编码器和视觉关键词搜索技术的图像中文描述方法。构建融合编码器,在一个卷积神经网络(CNN)中同时提取图像的局部和全局特征,丰富长短时记忆网络(LSTM)解码的语义信息;针对图像描述一般性表达,采用基于CNN的图像检索方法查找潜在视觉词汇,用于词向量解码;引入强化学习机制,在CIDEr评估指标上做句子层面上的优化,用以提高图像描述的词汇多样性。实验结果验证了所提方法的有效性。Aimed at the problem that the existing image caption models lack attention to the local details of an image and tend to give general description,a Chinese image caption method combining encoder and visual keyword search is proposed.A fusion encoder was constructed,and the local and global features of an image were extracted simultaneously in a convolutional neural network(CNN)to enrich the semantic information of image features in long short-term memory(LSTM)decoding stage.Aimed at the problem of general expression,the image retrieval method based on convolutional neural network was used to find the potential visual words,and was integrated into the word vector generation process in the decoding stage.Reinforcement learning mechanism was introduced to optimize the CIDEr evaluation index at the sentence level to improve the lexical diversity of image description.Experimental results verify the effectiveness of the proposed method.

关 键 词:图像中文描述 编解码结构 注意力机制 图像检索 强化学习 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象