基于跨模态级联扩散模型的图像描述方法  

Image captioning based on cross-modal cascaded diffusion model

在线阅读下载全文

作  者:陈巧红[1] 郭孟浩 方贤 孙麒[1] CHEN Qiaohong;GUO Menghao;FANG Xian;SUN Qi(School of Computer Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China)

机构地区:[1]浙江理工大学计算机科学与技术学院,浙江杭州310018

出  处:《浙江大学学报(工学版)》2025年第4期787-794,共8页Journal of Zhejiang University:Engineering Science

基  金:浙江省自然科学基金资助项目(LQ23F020021).

摘  要:现有文本扩散模型方法无法有效根据语义条件控制扩散过程,扩散模型训练过程的收敛较为困难,为此提出基于跨模态级联扩散模型的非自回归图像描述方法.引入跨模态语义对齐模块用于对齐视觉模态和文本模态之间的语义关系,将对齐后的语义特征向量作为后续扩散模型的语义条件.通过设计级联式的扩散模型逐步引入丰富的语义信息,确保生成的图像描述贴近整体语境.增强文本扩散过程中的噪声计划以提升模型对文本信息的敏感性,充分训练模型以增强模型的整体性能.实验结果表明,所提方法能够生成比传统图像描述生成方法更准确和丰富的文本描述.所提方法在各项评价指标上均明显优于其他非自回归文本生成方法,展现了在图像描述任务中使用扩散模型的有效性和潜力.Current text diffusion model methods are ineffective in controlling the diffusion process based on semantic conditions,and the convergence of the diffusion model training process is challenging.A non-autoregressive image captioning method was proposed based on a cross-modal cascaded diffusion model.A cross-modal semantic alignment module was introduced to align the semantic relationships between visual and text modalities,with the aligned semantic feature vectors serving as the semantic condition for the subsequent diffusion model.By designing a cascaded diffusion model,rich semantic information was gradually introduced to ensure that the generated image description closely aligns with the overall context.A noise schedule was enhanced during the text diffusion process to increase the model’s sensitivity to text information,and the model was fully trained to enhance the overall performance of the model.Experimental results show that the proposed method generates more accurate and rich text descriptions than traditional image captioning methods.The proposed method significantly outperforms other non-autoregressive text generation methods in various evaluation metrics,which showcases the effectiveness and potential of using diffusion models in the task of image captioning.

关 键 词:深度学习 图像描述 扩散模型 多模态编码器 级联结构 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象