检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡新荣 柯廷丰[3] 罗瑞奇 张梓怡 梁金星 杨凯 彭涛[3] HU Xinrong;KE Tingfeng;LUO Ruiqi;ZHANG Ziyi;LIANG Jinxing;YANG Kai;PENG Tao(State Key Laboratory of New Textile Material and Advanced Processing Technologies,Wuhan 430200,Hubei,China;Engineering Research Center of Hubei Province for Clothing Information,Wuhan 430200,Hubei,China;School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan 430200,Hubei,China)
机构地区:[1]纺织新材料与先进加工技术国家重点实验室,湖北武汉430200 [2]湖北省服装信息化工程技术研究中心,湖北武汉430200 [3]武汉纺织大学计算机与人工智能学院,湖北武汉430200
出 处:《武汉大学学报(理学版)》2024年第3期349-357,共9页Journal of Wuhan University:Natural Science Edition
基 金:宁波市科技局重点研发专项(2021Z069)。
摘 要:动态虚拟试衣的任务是在视频中以时空一致的方式将目标服装与人物进行匹配,目的是生成连贯流畅且真实的试衣视频。动态试衣过程中人物的姿态变化,导致试穿的服装出现自遮挡、印花模糊等问题。因此,本文提出基于空间变换网络(Spatial Transformer Network,STN)的服装扭曲网络动态虚拟试衣方法。在服装扭曲网络中,利用Transformer模块兼顾全局信息以及局部重点信息的优势强化数据特征区域,STN模块采用可学习的薄板样条插值(Thin Plate Spline,TPS)方法预测服装扭曲范围,获取扭曲图像及掩码;试衣网络利用自注意力机制的U-Net网络对齐扭曲图像掩码和人体表征信息,生成高质量试衣图像;最后,通过动态合成网络解决视频帧时间一致性问题,生成连贯高质量试衣视频。在VVT数据集上,与CPVTON相比,本文的方法将平均结构相似性指数(SSIM)提高了0.076,平均感知图像块相似度(LPIPS)降低了0.420;与FWGAN方法相比,其I3D和ResNeXt101分别降低了0.089和2.252。在VITON-HD数据集上,本文方法的SSIM指标也高于CP-VTON和FW-GAN,进一步表明该方法生成的图片质量高、失真低。Dynamic virtual try-on aims to generate coherent,smooth,and realistic fitting videos.Current methods often encounter issues such as clothing self-occlusion and blurry patterns due to changes in body posture.Therefore,this paper proposes a clothing distortion constraint and prediction method based on the Spatial Transformer Network(STN).In the clothing distortion network,the Transformer module takes advantage of both global information and local key information to strengthen the data feature region,and the STN module uses the learnable Thin Plate Spline interpolation(TPS)method to predict the clothing distortion range and obtain the distorted image and mask.The try-on network is conducive to the U-Net network of self-attention mechanisms to align the distorted image mask and human body representation information,and generate high-quality try-on images.Finally,a dynamic synthesis network was used to solve the temporal consistency problem of video frames,and a coherent high-quality fitting video was generated.On the VVT dataset,compared to CP-VTON,our proposed method achieved an improvement of 0.076 in the average Structural Similarity Index(SSIM)and a decrease of 0.420 in the average perceptual image patch similarity(LPIPS).Compared to the FW-GAN method,it reduced by 0.089 in the I3D metric and 2.252 in the ResNeXt101 metric.On the VITON-HD dataset,the SSIM index of the proposed method exceeds that of CP-VTON and FW-GAN,further indicating that the images generated by the proposed method exhibit high quality and low distortion.
关 键 词:动态虚拟试衣 空间变换网络 U-Net网络 自注意力机制
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.218.99.99