A Survey on Enhancing Image Captioning with Advanced Strategies and Techniques  

在线阅读下载全文

作  者:Alaa Thobhani Beiji Zou Xiaoyan Kui Amr Abdussalam Muhammad Asim Sajid Shah Mohammed ELAffendi 

机构地区:[1]School of Computer Science and Engineering,Central South University,Changsha,410083,China [2]Electronic Engineering and Information Science Department,University of Science and Technology of China,Hefei,230026,China [3]EIAS Data Science Lab,College of Computer and Information Sciences,Prince Sultan University,Riyadh,11586,Saudi Arabia

出  处:《Computer Modeling in Engineering & Sciences》2025年第3期2247-2280,共34页工程与科学中的计算机建模(英文)

基  金:supported by the National Natural Science Foundation of China(Nos.U22A2034,62177047);High Caliber Foreign Experts Introduction Plan funded by MOST,and Central South University Research Programme of Advanced Interdisciplinary Studies(No.2023QYJC020).

摘  要:Image captioning has seen significant research efforts over the last decade.The goal is to generate meaningful semantic sentences that describe visual content depicted in photographs and are syntactically accurate.Many real-world applications rely on image captioning,such as helping people with visual impairments to see their surroundings.To formulate a coherent and relevant textual description,computer vision techniques are utilized to comprehend the visual content within an image,followed by natural language processing methods.Numerous approaches and models have been developed to deal with this multifaceted problem.Several models prove to be stateof-the-art solutions in this field.This work offers an exclusive perspective emphasizing the most critical strategies and techniques for enhancing image caption generation.Rather than reviewing all previous image captioning work,we analyze various techniques that significantly improve image caption generation and achieve significant performance improvements,including encompassing image captioning with visual attention methods,exploring semantic information types in captions,and employing multi-caption generation techniques.Further,advancements such as neural architecture search,few-shot learning,multi-phase learning,and cross-modal embedding within image caption networks are examined for their transformative effects.The comprehensive quantitative analysis conducted in this study identifies cutting-edgemethodologies and sheds light on their profound impact,driving forward the forefront of image captioning technology.

关 键 词:Image captioning semantic attention multi-caption natural language processing visual attention methods 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象