基于门控时空注意力的视频帧预测模型

Video Frame Prediction Model Based on Gated Spatio-Temporal Attention

作　　者：李卫军张新勇[1] 高庾潇顾建来刘锦彤 LI Weijun;ZHANG Xinyong;GAO Yuxiao;GU Jianlai;LIU Jintong(School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China;The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission,North Minzu University,Yinchuan 750021,China)

机构地区：[1]北方民族大学计算机科学与工程学院,宁夏银川750021 [2]北方民族大学图像图形智能处理国家民委重点实验室,宁夏银川750021

出　　处：《郑州大学学报（工学版）》2024年第1期70-77,121,共9页Journal of Zhengzhou University（Engineering Science）

基　　金：中央高校基本科研业务费专项资金(2021JCYJ12);国家自然科学基金资助项目(61962001);宁夏自然科学基金资助项目(2021AAC03215);北方民族大学研究生创新项目(YCX23147)。

摘　　要：针对循环式视频帧预测架构存在精度低、训练缓慢,以及结构复杂和误差累积等问题,提出了一种基于门控时空注意力的视频帧预测模型。首先,通过空间编码器提取视频帧序列的高级语义信息,同时保留背景特征;其次,建立门控时空注意力机制,采用多尺度深度条形卷积和通道注意力来学习帧内及帧间的时空特征,并利用门控融合机制平衡时空注意力的特征学习能力;最后,由空间解码器将高级特征解码为预测的真实图像,并补充背景语义以完善细节。在Moving MNIST、TaxiBJ、WeatherBench、KITTI数据集上的实验结果显示,同多进多出模型SimVP相比,MSE分别降低了14.7%、6.7%、10.5%、18.5%,在消融扩展实验中,所提模型达到了较好的综合性能,具有预测精度高、计算量低和推理效率高等优势。A video frame prediction model based on gated spatio-temporal attention was proposed to address the issues of low accuracy,slow training,complex structure,and error accumulation in recurrent video frame prediction architectures.Firstly,high-level semantic information of the video frame sequence was extracted by a spatial encoder while preserving background features.Secondly,a gated spatio-temporal attention mechanism was established,utilizing multi-scale deep bar convolutions and channel attention to learn both intra-frame and inter-frame spatio-temporal features.A gate fusion mechanism was employed to balance the feature learning capability of spatiotemporal attention.Finally,a spatial decoder reconstructed the high-level features into predicted realistic images and complements background semantics to enhance the details.Experimental results on the Moving MNIST,Taxi-BJ,WeatherBench,and KITTI datasets showed that compared to the multi-input multi-output model SimVP,the mean squared error(MSE)was reduced by 14.7%,6.7%,10.5%,and 18.5%,respectively.In ablation and expansion experiments,the proposed model achieved good overall performance,demonstrating advantages such as high prediction accuracy,low computational complexity,and efficient inference.

关键词：视频帧预测卷积神经网络注意力机制门控卷积编解码网络

分类号：TP391.41[自动化与计算机技术—计算机应用技术] TP183[自动化与计算机技术—计算机科学与技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于门控时空注意力的视频帧预测模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于门控时空注意力的视频帧预测模型

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索