检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孟繁聪 徐伟 李海波 吴闽 郑竣杰 陈兴 Meng Fancong;Xu Wei;Li Haibo;Wu Min;Zheng Junjie;Chen Xing(East China Yixing Pumped Storage Power Co.,Ltd.,Wuxi 214200,Jiangsu,China;College of Computer and Information,Hohai University,Nanjing 211100,Jiangsu,China)
机构地区:[1]华东宜兴抽水蓄能有限公司,江苏无锡214200 [2]河海大学计算机与信息学院,江苏南京211100
出 处:《计算机应用与软件》2025年第4期208-216,244,共10页Computer Applications and Software
基 金:国网新源公司科技项目(SGXY2000074)。
摘 要:针对当前已有模型缺乏对图像局部细节的关注以及趋向于通用型描述问题,提出一种采用融合编码器和视觉关键词搜索技术的图像中文描述方法。构建融合编码器,在一个卷积神经网络(CNN)中同时提取图像的局部和全局特征,丰富长短时记忆网络(LSTM)解码的语义信息;针对图像描述一般性表达,采用基于CNN的图像检索方法查找潜在视觉词汇,用于词向量解码;引入强化学习机制,在CIDEr评估指标上做句子层面上的优化,用以提高图像描述的词汇多样性。实验结果验证了所提方法的有效性。Aimed at the problem that the existing image caption models lack attention to the local details of an image and tend to give general description,a Chinese image caption method combining encoder and visual keyword search is proposed.A fusion encoder was constructed,and the local and global features of an image were extracted simultaneously in a convolutional neural network(CNN)to enrich the semantic information of image features in long short-term memory(LSTM)decoding stage.Aimed at the problem of general expression,the image retrieval method based on convolutional neural network was used to find the potential visual words,and was integrated into the word vector generation process in the decoding stage.Reinforcement learning mechanism was introduced to optimize the CIDEr evaluation index at the sentence level to improve the lexical diversity of image description.Experimental results verify the effectiveness of the proposed method.
关 键 词:图像中文描述 编解码结构 注意力机制 图像检索 强化学习
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49