检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴沛宸 李文斌 郭放 刘钊[2] Wu Peichen;Li Wenbin;Guo Fang;Liu Zhao(School of Information Network Security,People’s Public Security University of China,Beijing 100038;Collaborative Innovation Center for Network Security and Rule of Law,People’s Public Security University of China,Beijing 100038)
机构地区:[1]中国人民公安大学信息网络安全学院,北京100038 [2]中国人民公安大学网络空间安全与法治协同创新中心,北京100038
出 处:《计算机辅助设计与图形学学报》2025年第3期407-413,共7页Journal of Computer-Aided Design & Computer Graphics
基 金:中国人民公安大学安全防范工程双一流创新研究专项(2023SYL08).
摘 要:对比语言-图像预训练模型作为一种基于多模态对比训练的神经网络,通过预训练大量的语言-图像对提取具有判别性的图像特征.为了关注连续帧之间的时序关系,消除不同模态特征之间的信息分布差异,提出一种基于特征增强和模态交互的视频异常行为检测算法.首先针对对比语言-图像预训练模型在视频连续帧特征提取阶段时间依赖性差的问题,使用局部和全局时间适配器构建时间相关性增强模块,分别在局部和全局注意力层关注时序信息;然后针对不同模态特征存在域间信息差异的问题,设计一种基于窗口分区移位的多模态特征交互模块,通过滑动窗口控制特征内部交互,消除信息分布差异;最后通过对齐视觉特征和文本特征,得到帧级异常置信度.在UCF-Crime数据集上,所提算法取得87.20%的检测准确率,验证了其有效性.The contrastive language-image pre-training model,as a neural network based on multimodal contrastive training,extracts discriminative image features by pre-training on a large number of language-image pairs.In order to focus on the temporal relationships between consecutive frames and eliminate the information distribution discrepancies between different modality features,we propose a video anomaly detection algorithm based on feature enhancement and modality interaction.Firstly,to address the issue of poor temporal dependency in the CLIP model during the feature extraction phase of consecutive video frames,we construct a temporal correlation enhancement module using local and global temporal adapters,which focus on temporal information at local and global attention layers,respectively.Secondly,to tackle the problem of domain information discrepancies between different modality features,we design a multimodal feature interaction module based on window partition shifting.This module controls internal feature interaction through a sliding window,eliminating information distribution discrepancies.Finally,by aligning visual features and textual features,we obtain frame-level anomaly confidence.On the UCF-Crime dataset,the proposed algorithm achieves a detection accuracy of 87.20%,validating its effectiveness.
关 键 词:对比语言-图像预训练 视频异常行为检测 时间相关性 特征增强 模态交互
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38