基于STN的服装扭曲网络动态虚拟试衣方法  被引量:1

A Dynamic Virtual Try-on Method for Clothing Twisting Network Based on STN

在线阅读下载全文

作  者:胡新荣 柯廷丰[3] 罗瑞奇 张梓怡 梁金星 杨凯 彭涛[3] HU Xinrong;KE Tingfeng;LUO Ruiqi;ZHANG Ziyi;LIANG Jinxing;YANG Kai;PENG Tao(State Key Laboratory of New Textile Material and Advanced Processing Technologies,Wuhan 430200,Hubei,China;Engineering Research Center of Hubei Province for Clothing Information,Wuhan 430200,Hubei,China;School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan 430200,Hubei,China)

机构地区:[1]纺织新材料与先进加工技术国家重点实验室,湖北武汉430200 [2]湖北省服装信息化工程技术研究中心,湖北武汉430200 [3]武汉纺织大学计算机与人工智能学院,湖北武汉430200

出  处:《武汉大学学报(理学版)》2024年第3期349-357,共9页Journal of Wuhan University:Natural Science Edition

基  金:宁波市科技局重点研发专项(2021Z069)。

摘  要:动态虚拟试衣的任务是在视频中以时空一致的方式将目标服装与人物进行匹配,目的是生成连贯流畅且真实的试衣视频。动态试衣过程中人物的姿态变化,导致试穿的服装出现自遮挡、印花模糊等问题。因此,本文提出基于空间变换网络(Spatial Transformer Network,STN)的服装扭曲网络动态虚拟试衣方法。在服装扭曲网络中,利用Transformer模块兼顾全局信息以及局部重点信息的优势强化数据特征区域,STN模块采用可学习的薄板样条插值(Thin Plate Spline,TPS)方法预测服装扭曲范围,获取扭曲图像及掩码;试衣网络利用自注意力机制的U-Net网络对齐扭曲图像掩码和人体表征信息,生成高质量试衣图像;最后,通过动态合成网络解决视频帧时间一致性问题,生成连贯高质量试衣视频。在VVT数据集上,与CPVTON相比,本文的方法将平均结构相似性指数(SSIM)提高了0.076,平均感知图像块相似度(LPIPS)降低了0.420;与FWGAN方法相比,其I3D和ResNeXt101分别降低了0.089和2.252。在VITON-HD数据集上,本文方法的SSIM指标也高于CP-VTON和FW-GAN,进一步表明该方法生成的图片质量高、失真低。Dynamic virtual try-on aims to generate coherent,smooth,and realistic fitting videos.Current methods often encounter issues such as clothing self-occlusion and blurry patterns due to changes in body posture.Therefore,this paper proposes a clothing distortion constraint and prediction method based on the Spatial Transformer Network(STN).In the clothing distortion network,the Transformer module takes advantage of both global information and local key information to strengthen the data feature region,and the STN module uses the learnable Thin Plate Spline interpolation(TPS)method to predict the clothing distortion range and obtain the distorted image and mask.The try-on network is conducive to the U-Net network of self-attention mechanisms to align the distorted image mask and human body representation information,and generate high-quality try-on images.Finally,a dynamic synthesis network was used to solve the temporal consistency problem of video frames,and a coherent high-quality fitting video was generated.On the VVT dataset,compared to CP-VTON,our proposed method achieved an improvement of 0.076 in the average Structural Similarity Index(SSIM)and a decrease of 0.420 in the average perceptual image patch similarity(LPIPS).Compared to the FW-GAN method,it reduced by 0.089 in the I3D metric and 2.252 in the ResNeXt101 metric.On the VITON-HD dataset,the SSIM index of the proposed method exceeds that of CP-VTON and FW-GAN,further indicating that the images generated by the proposed method exhibit high quality and low distortion.

关 键 词:动态虚拟试衣 空间变换网络 U-Net网络 自注意力机制 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象