检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李卫军 张新勇[1] 高庾潇 顾建来 刘锦彤 LI Weijun;ZHANG Xinyong;GAO Yuxiao;GU Jianlai;LIU Jintong(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China;The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan 750021,China)
机构地区:[1]北方民族大学计算机科学与工程学院,宁夏银川750021 [2]北方民族大学图像图形智能处理国家民委重点实验室,宁夏银川750021
出 处:《郑州大学学报(工学版)》2024年第1期70-77,121,共9页Journal of Zhengzhou University(Engineering Science)
基 金:中央高校基本科研业务费专项资金(2021JCYJ12);国家自然科学基金资助项目(61962001);宁夏自然科学基金资助项目(2021AAC03215);北方民族大学研究生创新项目(YCX23147)。
摘 要:针对循环式视频帧预测架构存在精度低、训练缓慢,以及结构复杂和误差累积等问题,提出了一种基于门控时空注意力的视频帧预测模型。首先,通过空间编码器提取视频帧序列的高级语义信息,同时保留背景特征;其次,建立门控时空注意力机制,采用多尺度深度条形卷积和通道注意力来学习帧内及帧间的时空特征,并利用门控融合机制平衡时空注意力的特征学习能力;最后,由空间解码器将高级特征解码为预测的真实图像,并补充背景语义以完善细节。在Moving MNIST、TaxiBJ、WeatherBench、KITTI数据集上的实验结果显示,同多进多出模型SimVP相比,MSE分别降低了14.7%、6.7%、10.5%、18.5%,在消融扩展实验中,所提模型达到了较好的综合性能,具有预测精度高、计算量低和推理效率高等优势。A video frame prediction model based on gated spatio-temporal attention was proposed to address the issues of low accuracy,slow training,complex structure,and error accumulation in recurrent video frame prediction architectures.Firstly,high-level semantic information of the video frame sequence was extracted by a spatial encoder while preserving background features.Secondly,a gated spatio-temporal attention mechanism was established,utilizing multi-scale deep bar convolutions and channel attention to learn both intra-frame and inter-frame spatio-temporal features.A gate fusion mechanism was employed to balance the feature learning capability of spatiotemporal attention.Finally,a spatial decoder reconstructed the high-level features into predicted realistic images and complements background semantics to enhance the details.Experimental results on the Moving MNIST,Taxi-BJ,WeatherBench,and KITTI datasets showed that compared to the multi-input multi-output model SimVP,the mean squared error(MSE)was reduced by 14.7%,6.7%,10.5%,and 18.5%,respectively.In ablation and expansion experiments,the proposed model achieved good overall performance,demonstrating advantages such as high prediction accuracy,low computational complexity,and efficient inference.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33