机构地区:[1]湖北大学人工智能学院,武汉430062 [2]智能感知系统与安全教育部重点实验室,武汉430062 [3]中国科学院自动化研究所,北京100190
出 处:《中国图象图形学报》2024年第11期3305-3318,共14页Journal of Image and Graphics
基 金:国家自然科学基金项目(62273135);湖北省大学生创新创业训练计划基金资助项目(S202310512042,S202310512025);湖北大学研究生教育教学改革研究项目(1190017755);湖北大学原创探索种子专项项目(202416403000001)。
摘 要:目的针对虚拟到现实驾驶场景翻译中成对的数据样本匮乏、翻译结果不精准以及模型训练不稳定等问题,提出一种多模态数据融合的条件扩散模型。方法首先,为解决目前主流的基于生成对抗网络的图像翻译方法中存在的模式崩塌、训练不稳定等问题,以生成多样性强、训练稳定性好的扩散模型为基础,构建图像翻译模型;其次,为解决传统扩散模型无法融入先验信息从而无法控制图像生成这一问题,提出基于多头自注意力机制的多模态特征融合方法,该方法能将多模态信息融入扩散模型的去噪过程,从而起到条件控制的作用;最后,基于语义分割图和深度图能分别表征物体的轮廓信息和深度信息这一特点,将其与噪声图像进行融合后输入去噪网络,以此构建多模态数据融合的条件扩散模型,从而实现更精准的驾驶场景图像翻译。结果在Cityscapes数据集上训练本文提出的模型,并且将本文方法与先进方法进行比较,结果表明,本文方法可以实现轮廓细节更细致、距离远近更一致的驾驶场景图像翻译,在弗雷歇初始距离(Fréchet inception distance,FID)和学习感知图像块相似度(learned perceptual image patch similarity,LPIPS)等指标上均取得了更好的结果,分别为44.20和0.377。结论本文方法能有效解决现有图像翻译方法中数据样本匮乏、翻译结果不精准以及模型训练不稳定等问题,提高驾驶场景的翻译精确度,为实现安全实用的自动驾驶提供理论支撑和数据基础。Objective Safety is the most important consideration for autonomous driving vehicles.New autonomous driving methods need numerous training and testing processes before their application in real vehicles.However,training and testing autonomous driving methods directly in real-world scenarios is a costly and risky task.Many researchers first train and test their methods in simulated-world scenarios and then transfer the trained knowledge to real-world scenarios.However,many differences in scene modeling,light,and vehicle dynamics are observed between the two-world scenarios.Therefore,the autonomous driving model trained in simulated-world scenarios cannot be effectively generalized to real-world scenarios.With the development of deep learning technologies,image translation,which aims to transform the content of an image from one presentation form to another,has made considerable achievements in many fields,such as image beautification,style transfer,scene design,and video special effects.If image translation technology is applied to the translation of simulated driving scenarios to real ones,then this technology can not only solve the problem of poor generalization capability of autonomous driving models but can also effectively reduce the cost and risk of training in the real scenarios.Unfortunately,existing image translation methods applied in autonomous driving lack datasets of paired simulated and real scenarios,and most of the mainstream image translation methods are based on generative adversarial network(GAN),which have problems of mode collapse and unstable training.The generated images also suffer from numerous detail problems,such as distorted object contours and unnatural small objects in the scene.These problems will not only further affect the perception of automatic driving,which will then impact the decision regarding automatic driving,but will also influence the evaluation metrics of image translation.In this paper,a multimodal conditional diffusion model based on the denoising diffusion probabilist
关 键 词:虚拟到现实 图像翻译 扩散模型 多模态融合 驾驶场景
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...