机构地区:[1]中国海洋大学信息科学与工程学部,青岛266100 [2]山东浪潮科学研究院有限公司,济南250101
出 处:《中国图象图形学报》2024年第2期491-505,共15页Journal of Image and Graphics
基 金:国家自然科学基金项目(62171421);山东省泰山学者青年专家计划(tsqn202306096)。
摘 要:目的 图像内补与外推可看做根据已知区域绘制未知区域的问题,是计算机视觉领域研究热点。近年来,深度神经网络成为解决内补与外推问题的主流方法。然而,当前解决方法多分别对待内补与外推问题,导致二者难以统一处理;且模型多采用卷积神经网络(convolutional neural network,CNN)构建,受到视野局部性限制,较难绘制远距离内容。针对这两个问题,本文按照分而治之思想联合CNN与Transformer构建深度神经网络,提出图像内补与外推统一处理框架及模型。方法 将内补与外推问题的解决过程分解为“表征、预测、合成”3个部分,表征与合成采用CNN完成,充分利用其局部相关性进行图像到特征映射和特征到图像重建;核心预测由Transformer实现,充分发挥其强大的全局上下文关系建模能力,并提出掩膜自增策略迭代预测特征,降低Transformer同时预测大范围未知区域特征的难度;最后引入对抗学习提升绘制图像逼真度。结果 实验给出在多种数据集下内补与外推对比评测,结果显示本文方法各项性能指标均超越对比方法。通过消融实验发现,模型相比采用非分解方式具有更佳表现,说明分而治之思路功效显著。此外,对掩膜自增策略进行详细的实验分析,表明迭代预测方法可有效提升绘制能力。最后,探究了Transformer关键结构参数对模型性能的影响。结论 本文提出一种迭代预测统一框架解决图像内补与外推问题,相较对比方法性能更佳,并且各部分设计对性能提升均有贡献,显示了迭代预测统一框架及方法在图像内补与外推问题上的应用价值与潜力。Objective Image inpainting and outpainting tasks are significant challenges in the field of computer vision.They involve the filling of unknown regions in an image on the basis of information available in known regions.With itsadvancements,deep learning has become the mainstream approach for dealing with these tasks.However,existing solu⁃tions frequently regard inpainting and outpainting as separate problems,and thus,they lack the ability to adapt seamlesslybetween the two.Furthermore,convolutional neural networks(CNNs)are commonly used in these methods,but their limi⁃tation in capturing long-range content due to locality poses challenges.To address these issues,this study proposes a uni⁃fied framework that combines CNN and Transformer models on the basis of a divide-and-conquer strategy,aiming to dealwith image inpainting and outpainting effectively.Method Our proposed approach consists of three stages:representation,prediction,and synthesis.In the representation stage,CNNs are employed to map the input images to a set of meaningfulfeatures.This step leverages the local information processing capability of CNNs and enables the extraction of relevant fea⁃tures from the known regions of an image.We use a CNN encoder that incorporates partial convolutions and pixel normaliza⁃tion to reduce the introduction of irrelevant information from unknown regions.The extracted features obtained are thenpassed to the prediction stage.In the prediction stage,we utilize the Transformer architecture,which excels in modelingglobal context,to generate predictions for the unknown regions of an image.The Transformer has been proven to be highlyeffective in capturing long-range dependencies and contextual information in various domains,such as natural language pro⁃cessing.By incorporating a Transformer,we aim to enhance the model’s ability to predict accurate and coherent contentfor inpainting and outpainting tasks.To address the challenge of predicting features for large-range unknown regions in par⁃allel,we introduce
关 键 词:图像内补 图像外推 分而治之 迭代预测 TRANSFORMER 卷积神经网络(CNN)
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...