检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘昊 杨小汕[1] 徐常胜[1] LIU Hao;YANG Xiaoshan;XU Changsheng(National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]中国科学院自动化研究所模式识别国家重点实验室,北京100190
出 处:《北京航空航天大学学报》2022年第8期1399-1408,共10页Journal of Beijing University of Aeronautics and Astronautics
基 金:国家重点研发计划(2018AAA0100604);国家自然科学基金(61720106006,62036012,62072455,61721004,U1836220,U1705262);中国科学院前沿科学重点研究计划(QYZDJ-SSW-JSC039);北京市自然科学基金(L2010011)。
摘 要:图像描述生成任务旨在基于输入图像生成对应的自然语言描述。现有任务数据集中大部分图像的描述语句通常包含少量常见词和大量罕见词,呈现出长尾分布。已有研究专注于提升模型在整个数据集上的描述语句准确性,忽视了对大量罕见词的准确描述,限制了在实际场景中的应用。针对这一问题,提出了基于动态语义记忆网络(DSMN)的长尾图像描述生成模型,旨在保证模型对常见名词准确描述的同时,提升模型对罕见名词的描述效果。DSMN模型能够动态挖掘罕见词与常见词的全局语义关系,实现从常见词到罕见词的语义知识迁移,通过协同考虑全局单词语义关系信息及当前输入图像和已生成单词的局部语义信息提升罕见词的语义特征表示能力和预测性能。为了有效评价长尾图像描述生成方法,基于MS COCO Captioning数据集定义了长尾图像描述生成任务专用测试集Few-COCO。在MS COCO Captioning和Few-COCO数据集上的多个量化实验表明,DSMN模型在Few-COCO数据集上的罕见词描述准确率为0.6028%,召回率为0.3234%,F-1值为0.3567%,相较于基准方法提升明显。Image captioning takes image as input and outputs a text sequence.Nowadays,most images included in image captioning datasets are captured from daily life of internet users.Captions of these images are consequently composed of a few common words and many rare words.Most existing studies focus on improving performance of captioning in the whole dataset,regardless of captioning performance among rare words.To solve this problem,we introduce long-tail image captioning with dynamic semantic memory network(DSMN).Long-tail image captioning requires model improving performance of rare words generation,while maintaining good performance of common words generation.DSMN model dynamically mining the global semantic relationship between rare words and common words,enabling knowledge transfer from common words to rare words.Result shows DSMN improves performance of semantic representation of rare words by collaborating global words semantic relation and local semantic information of the input picture and generated words.For better evaluation on long-tail image captioning,we organized a task-specified test split Few-COCO from original MS COCO Captioning dataset.By conducting quantitative and qualitative experiments,the rare words description precision of DSMN model on Few-COCO dataset is 0.6028%,the recall is 0.3234%,and the F-1 value is 0.3567%,showing significant improvement compared with baseline methods.
关 键 词:深度学习 图像理解 图像描述生成 长尾分布 记忆网络
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49