Visuals to Text:A Comprehensive Review on Automatic Image Captioning  被引量:4

在线阅读下载全文

作  者:Yue Ming Nannan Hu Chunxiao Fan Fan Feng Jiangwan Zhou Hui Yu 

机构地区:[1]Beijing University of Posts and Telecommunications,Beijing 100876,China [2]School of Creative Technologies,University of Portsmouth,Portsmouth PO12DJ,UK

出  处:《IEEE/CAA Journal of Automatica Sinica》2022年第8期1339-1365,共27页自动化学报(英文版)

基  金:supported by Beijing Natural Science Foundation of China(L201023);the Natural Science Foundation of China(62076030)。

摘  要:Image captioning refers to automatic generation of descriptive texts according to the visual content of images.It is a technique integrating multiple disciplines including the computer vision(CV),natural language processing(NLP)and artificial intelligence.In recent years,substantial research efforts have been devoted to generate image caption with impressive progress.To summarize the recent advances in image captioning,we present a comprehensive review on image captioning,covering both traditional methods and recent deep learning-based techniques.Specifically,we first briefly review the early traditional works based on the retrieval and template.Then deep learning-based image captioning researches are focused,which is categorized into the encoder-decoder framework,attention mechanism and training strategies on the basis of model structures and training manners for a detailed introduction.After that,we summarize the publicly available datasets,evaluation metrics and those proposed for specific requirements,and then compare the state of the art methods on the MS COCO dataset.Finally,we provide some discussions on open challenges and future research directions.

关 键 词:Artificial intelligence attention mechanism encoder-decoder framework image captioning multi-modal understanding training strategies 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象