基于姿态嵌入机制和多尺度注意力的单张着装图像视频合成  被引量:1

Single dress image video synthesis based on pose embeddingand multi-scale attention

在线阅读下载全文

作  者:陆寅雯 侯珏 杨阳[1,2] 顾冰菲 张宏伟 刘正[2,3,5] LU Yinwen;HOU Jue;YANG Yang;GU Bingfei;ZHANG Hongwei;LIU Zheng(School of Fashion Design&Engineering,Zhejiang Sci-Tech University,Hangzhou,Zhejiang 310018,China;Apparel Engineering Research Center of Zhejiang,Hangzhou,Zhejiang 310018,China;Key Laboratory of Silk Culture Inheritance and Product Design Digital Technology,Ministry of Culture and Tourism,Zhejiang Sci-Tech University,Hangzhou,Zhejiang 310018,China;School of Electronic Information,Xi′an Polytechnic University,Xi′an,Shaanxi 710043,China;School of International Education,Zhejiang Sci-Tech University,Hangzhou,Zhejiang 310018,China)

机构地区:[1]浙江理工大学服装学院,浙江杭州310018 [2]浙江省服装工程技术研究中心,浙江杭州310018 [3]浙江理工大学丝绸文化传承与产品设计数字化技术文化和旅游部重点实验室,浙江杭州310018 [4]西安工程大学电子信息学院,陕西西安710043 [5]浙江理工大学国际教育学院,浙江杭州310018

出  处:《纺织学报》2024年第7期165-172,共8页Journal of Textile Research

基  金:国家自然科学基金青年科学基金项目(61803292);浙江省科技计划项目(2023C03181);浙江理工大学科研启动基金项目(21072325-Y)。

摘  要:基于单张着装图像生成视频在虚拟试衣和三维重建等领域有重要应用,但现有方法存在生成帧之间动作不连贯、生成视频质量差、人物服装细节缺失等问题,为此提出一种基于姿态嵌入机制以及多尺度注意链接的生成对抗网络模型。首先采用位置嵌入方法,对相邻帧间动作建模,然后针对每个分辨率尺度的特征添加注意力链接,同时在训练过程中输入人物解析图像,最后在服装视频合成数据集的测试集合上进行结果验证。结果表明:本文模型比当前单张着装图像生成视频主流模型在定性结果与定量结果指标上均有所提高,其中峰值信噪比和运动矢量分别为20.89和0.1084,说明本文模型能够有效提高视频生成的质量与帧间动作的稳定性,为着装人物视频合成提供了新模型。Objective Video generation based on a single dress image has important applications in the fields of virtual try-on and 3-D reconstruction.However,existing methods have problems such as incoherent movements between generated frames,poor quality of generated videos,and missing details of clothing.In order to address the above issues,a generative adversarial network model based on pose embedding mechanism and multi-scale attention links is proposed.Method A generative adversarial network(EBDGAN)model based on pose embedding mechanism and multi-scale attention was proposed.Pose embedding method was adopted to model adjacent frame actions and improve the coherence of video generated actions,and attention links for each resolution scale feature were added to improve feature decoding efficiency and generate image frame fidelity.Human parsing images were utilized during the training process to improve the clothing accuracy of the synthesized images.Results The learned perceptual image patch similarity(LPIPS)and peak signal-to-noise-ratio(PSNR)values indicated that the generated results of EBDGAN were closer to the original video in terms of color and structure.From the motion vector(MV),it was seen that the video generated by EBDGAN from a single image moved less between adjacent frames and had higher similarity between the two frames,leading to more stable the overall videos.Although the structure similarity index metric(SSIM)score was slightly lower than(CASD),this method was more efficient as it only requires image and pose information as input.In some frames where the characters were far from the camera,EBDGAN retained the details of hair and shoes.In some frames where the characters are closer to the camera,the front clothing image of EBDGAN retained the collar and hem,such as the collar of the left image in the second row and the hem of the right clothing.When the characters in the video turned around,EBDGAN did not cause the characters in the video to exhibit strange pose or lose some body parts,but instead gener

关 键 词:生成对抗网络 视频合成 深度学习 姿态嵌入 注意力机制 着装图像 虚拟试衣 

分 类 号:TS942.8[轻工技术与工程—服装设计与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象