检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陶锐 任洪娥[1,3] 曹海燕 TAO Rui;REN Honge;CAO Haiyan(College of Information and Computer Engineering,Northeast Forestry University,Harbin 150040,China;College of Computer Science,Hulunbuir University,Hulunbuir 021008,China;Heilongjiang Forestry Intelligent Equipment Engineering Research Center,Harbin 150040,China)
机构地区:[1]东北林业大学信息与计算机工程学院,哈尔滨150040 [2]呼伦贝尔学院计算机学院,内蒙古呼伦贝尔021008 [3]黑龙江省林业智能装备工程研究中心,哈尔滨150040
出 处:《哈尔滨理工大学学报》2024年第2期16-24,共9页Journal of Harbin University of Science and Technology
基 金:黑龙江省自然科学基金(LH2020F040);中央高校基本科研业务费专项资金资助项目(2572017PZ10)。
摘 要:图像描述是指为图像自动生成与其内容相符的语言描述。桥接计算机视觉和自然语言处理两个领域的预训练模型构建图像描述模型时,跨模态语义一致性是共享子空间嵌入的核心问题。本文将图像拆分成若干片作为视觉语义单元与语言特征进行自由的跨模态关联,突破了有限视觉特征分类的限制;联合运用掩码学习和图文特征匹配两个损失函数,挑选高难度负样本训练跨模态跳接网络提取一致性全局语义,提高了子空间邻域内高相似度图文特征点匹配的准确度。在MS COCO和Flickr30k两个数据集上的实验结果表明,与同样采用CLIP+GPT生成图像描述的模型及其他主流模型相比,性能均有提升,证明了所提出模型的有效性。Image captioning is a method for automatically generating language descriptions for images.Cross-modal semantic consistency is the core issue of shared subspace embedding when bridging pre-training models in the fields of computer vision and natural language processing to construct image captioning models.In this paper,we introduce a new method that breaks through the limitation of visual feature classification by dividing images into patches as visual semantic units for open-vocabulary cross-modal association with language features.It combines the two loss functions of masked language modeling and image-text matching,selects highly difficult negative samples to train the cross-modal hop network to extract consistent global semantics,improving the accuracy of distinguishing highly similar image and text feature points within the neighborhood of the subspace.Experimental results on two datasets,MS COCO and Flickr30k,show that the performance of the model is improved compared to models that also use CLIP+GPT to generate image descriptions and other mainstream methods,demonstrating the effectiveness of the proposed method.
关 键 词:跨模态 图像描述 预训练模型 共享子空间 语义对齐
分 类 号:TP751.1[自动化与计算机技术—检测技术与自动化装置]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.44