基于特征增强和模态交互的视频异常行为检测

Video Anomaly Detection Based on Feature Enhancement and Modal Interaction

作　　者：吴沛宸李文斌郭放刘钊[2] Wu Peichen;Li Wenbin;Guo Fang;Liu Zhao(School of Information Network Security,People’s Public Security University of China,Beijing 100038;Collaborative Innovation Center for Network Security and Rule of Law,People’s Public Security University of China,Beijing 100038)

机构地区：[1]中国人民公安大学信息网络安全学院,北京100038 [2]中国人民公安大学网络空间安全与法治协同创新中心,北京100038

出　　处：《计算机辅助设计与图形学学报》2025年第3期407-413,共7页Journal of Computer-Aided Design & Computer Graphics

基　　金：中国人民公安大学安全防范工程双一流创新研究专项(2023SYL08).

摘　　要：对比语言-图像预训练模型作为一种基于多模态对比训练的神经网络,通过预训练大量的语言-图像对提取具有判别性的图像特征.为了关注连续帧之间的时序关系,消除不同模态特征之间的信息分布差异,提出一种基于特征增强和模态交互的视频异常行为检测算法.首先针对对比语言-图像预训练模型在视频连续帧特征提取阶段时间依赖性差的问题,使用局部和全局时间适配器构建时间相关性增强模块,分别在局部和全局注意力层关注时序信息;然后针对不同模态特征存在域间信息差异的问题,设计一种基于窗口分区移位的多模态特征交互模块,通过滑动窗口控制特征内部交互,消除信息分布差异;最后通过对齐视觉特征和文本特征,得到帧级异常置信度.在UCF-Crime数据集上,所提算法取得87.20%的检测准确率,验证了其有效性.The contrastive language-image pre-training model,as a neural network based on multimodal contrastive training,extracts discriminative image features by pre-training on a large number of language-image pairs.In order to focus on the temporal relationships between consecutive frames and eliminate the information distribution discrepancies between different modality features,we propose a video anomaly detection algorithm based on feature enhancement and modality interaction.Firstly,to address the issue of poor temporal dependency in the CLIP model during the feature extraction phase of consecutive video frames,we construct a temporal correlation enhancement module using local and global temporal adapters,which focus on temporal information at local and global attention layers,respectively.Secondly,to tackle the problem of domain information discrepancies between different modality features,we design a multimodal feature interaction module based on window partition shifting.This module controls internal feature interaction through a sliding window,eliminating information distribution discrepancies.Finally,by aligning visual features and textual features,we obtain frame-level anomaly confidence.On the UCF-Crime dataset,the proposed algorithm achieves a detection accuracy of 87.20%,validating its effectiveness.

关键词：对比语言-图像预训练视频异常行为检测时间相关性特征增强模态交互

分类号：TP391.41[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于特征增强和模态交互的视频异常行为检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于特征增强和模态交互的视频异常行为检测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索