从图像到语言:图像标题生成与描述  被引量:3

From image to language:image captioning and description

在线阅读下载全文

作  者:谭云兰[1,2,3] 汤鹏杰 张丽 罗玉盘 Tan Yunlan;Tang Pengjie;Zhang Li;Luo Yupan(School of Electronics and Information Engineering,Jinggangshan University,Ji′an 343009,China;Jiangxi Engineering Laboratory of IoT Technologies for Crop Growth,Ji′an 343009,China;Network Information Center,Jinggangshan University,Ji′an 343009,China)

机构地区:[1]井冈山大学电子与信息工程学院,吉安343009 [2]江西省农作物生长物联网技术工程实验室,吉安343009 [3]井冈山大学网络信息中心,吉安343009

出  处:《中国图象图形学报》2021年第4期727-750,共24页Journal of Image and Graphics

基  金:国家自然科学基金项目(62062041);井冈山大学博士科研启动项目(JZB1923,JZB1807);江西省艺术科学规划项目(YG2017283);江西省高校人文社科基地招标项目(JD17082);江西省高校信息化学会一般项目(GJJ191662,GJJ191663)。

摘  要:图像标题生成与描述的任务是通过计算机将图像自动翻译成自然语言的形式重新表达出来,该研究在人类视觉辅助、智能人机环境开发等领域具有广阔的应用前景,同时也为图像检索、高层视觉语义推理和个性化描述等任务的研究提供支撑。图像数据具有高度非线性和繁杂性,而人类自然语言较为抽象且逻辑严谨,因此让计算机自动地对图像内容进行抽象和总结,具有很大的挑战性。本文对图像简单标题生成与描述任务进行了阐述,分析了基于手工特征的图像简单描述生成方法,并对包括基于全局视觉特征、视觉特征选择与优化以及面向优化策略等基于深度特征的图像简单描述生成方法进行了梳理与总结。针对图像的精细化描述任务,分析了当前主要的图像"密集描述"与结构化描述模型与方法。此外,本文还分析了融合情感信息与个性化表达的图像描述方法。在分析与总结的过程中,指出了当前各类图像标题生成与描述方法存在的不足,提出了下一步可能的研究趋势与解决思路。对该领域常用的MS COCO2014(Microsoft common objects in context)、Flickr30K等数据集进行了详细介绍,对图像简单描述、图像密集描述与段落描述和图像情感描述等代表性模型在数据集上的性能进行了对比分析。由于视觉数据的复杂性与自然语言的抽象性,尤其是融合情感与个性化表达的图像描述任务,在相关特征提取与表征、语义词汇的选择与嵌入、数据集构建及描述评价等方面尚存在大量问题亟待解决。Image captioning and description belong to high-level visual understanding.They translate an image into natural language with decent words,appropriate sentence patterns,and correct grammars.The task is interesting and has wide application prospects on early education,visually impaired aid,automatic explanation,auto-reminding,development of intelligent interactive environment,and even designing of intelligent robots.They also provide support for studying image retrieval,object detection,visual semantic reasoning,and personalized description.At present,the task has attracted the attention of several researchers,and a large number of effective models have been proposed and developed.However,the task is difficult and challenging because the model has to bridge the visual information and natural language and close the semantic gap between the data with different modalities.In this work,the development timeline,popular frameworks and models,frequently used datasets,and corresponding performance of image captioning and description are surveyed comprehensively.Additionally,the remaining questions and limitations of current works are investigated and analyzed in depth.Overall,there are four parts for image captioning and description illustration in this study:1)the image simple captioning and description(one sentence is generated for an image generally),including handcraft feature-based methods and deep feature-based approaches;2)image dense captioning(multiple but relatively independent sentences are generated in general)and refined paragraph description(paragraph with a certain structure and logic is generated generally);3)image personalized and sentimental captioning and description(sentence with personalized style and sentimental words is generated in general);and 4)corresponding evaluation datasets,metrics,and performances of the popular models.For the first part,the research history of image captioning and description is first introduced,including template-based framework and visual semantic retrieval-based framewor

关 键 词:图像标题生成 深度特征 视觉描述 语段生成 图像情感 逻辑语义 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象