基于姿态嵌入机制和多尺度注意力的单张着装图像视频合成被引量：1

Single dress image video synthesis based on pose embeddingand multi-scale attention

作　　者：陆寅雯侯珏杨阳[1,2] 顾冰菲张宏伟刘正[2,3,5] LU Yinwen;HOU Jue;YANG Yang;GU Bingfei;ZHANG Hongwei;LIU Zheng(School of Fashion Design&Engineering,Zhejiang Sci-Tech University,Hangzhou,Zhejiang 310018,China;Apparel Engineering Research Center of Zhejiang,Hangzhou,Zhejiang 310018,China;Key Laboratory of Silk Culture Inheritance and Product Design Digital Technology,Ministry of Culture and Tourism,Zhejiang Sci-Tech University,Hangzhou,Zhejiang 310018,China;School of Electronic Information,Xi′an Polytechnic University,Xi′an,Shaanxi 710043,China;School of International Education,Zhejiang Sci-Tech University,Hangzhou,Zhejiang 310018,China)

机构地区：[1]浙江理工大学服装学院,浙江杭州310018 [2]浙江省服装工程技术研究中心,浙江杭州310018 [3]浙江理工大学丝绸文化传承与产品设计数字化技术文化和旅游部重点实验室,浙江杭州310018 [4]西安工程大学电子信息学院,陕西西安710043 [5]浙江理工大学国际教育学院,浙江杭州310018

出　　处：《纺织学报》2024年第7期165-172,共8页Journal of Textile Research

基　　金：国家自然科学基金青年科学基金项目(61803292);浙江省科技计划项目(2023C03181);浙江理工大学科研启动基金项目(21072325-Y)。

摘　　要：基于单张着装图像生成视频在虚拟试衣和三维重建等领域有重要应用,但现有方法存在生成帧之间动作不连贯、生成视频质量差、人物服装细节缺失等问题,为此提出一种基于姿态嵌入机制以及多尺度注意链接的生成对抗网络模型。首先采用位置嵌入方法,对相邻帧间动作建模,然后针对每个分辨率尺度的特征添加注意力链接,同时在训练过程中输入人物解析图像,最后在服装视频合成数据集的测试集合上进行结果验证。结果表明:本文模型比当前单张着装图像生成视频主流模型在定性结果与定量结果指标上均有所提高,其中峰值信噪比和运动矢量分别为20.89和0.1084,说明本文模型能够有效提高视频生成的质量与帧间动作的稳定性,为着装人物视频合成提供了新模型。Objective Video generation based on a single dress image has important applications in the fields of virtual try-on and 3-D reconstruction.However,existing methods have problems such as incoherent movements between generated frames,poor quality of generated videos,and missing details of clothing.In order to address the above issues,a generative adversarial network model based on pose embedding mechanism and multi-scale attention links is proposed.Method A generative adversarial network(EBDGAN)model based on pose embedding mechanism and multi-scale attention was proposed.Pose embedding method was adopted to model adjacent frame actions and improve the coherence of video generated actions,and attention links for each resolution scale feature were added to improve feature decoding efficiency and generate image frame fidelity.Human parsing images were utilized during the training process to improve the clothing accuracy of the synthesized images.Results The learned perceptual image patch similarity(LPIPS)and peak signal-to-noise-ratio(PSNR)values indicated that the generated results of EBDGAN were closer to the original video in terms of color and structure.From the motion vector(MV),it was seen that the video generated by EBDGAN from a single image moved less between adjacent frames and had higher similarity between the two frames,leading to more stable the overall videos.Although the structure similarity index metric(SSIM)score was slightly lower than(CASD),this method was more efficient as it only requires image and pose information as input.In some frames where the characters were far from the camera,EBDGAN retained the details of hair and shoes.In some frames where the characters are closer to the camera,the front clothing image of EBDGAN retained the collar and hem,such as the collar of the left image in the second row and the hem of the right clothing.When the characters in the video turned around,EBDGAN did not cause the characters in the video to exhibit strange pose or lose some body parts,but instead gener

关键词：生成对抗网络视频合成深度学习姿态嵌入注意力机制着装图像虚拟试衣

分类号：TS942.8[轻工技术与工程—服装设计与工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于姿态嵌入机制和多尺度注意力的单张着装图像视频合成被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于姿态嵌入机制和多尺度注意力的单张着装图像视频合成 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于姿态嵌入机制和多尺度注意力的单张着装图像视频合成被引量：1