检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:潘振鹏 李志军 薛超然 黎鑫 吴克伟[1] 谢昭[1] PAN Zhenpeng;LI Zhijun;XUE Chaoran;LI Xin;WU Kewei;XIE Zhao(School of Computer Science&Information Engineering,Hefei University of Technology,Hefei 230601,China)
机构地区:[1]合肥工业大学计算机与信息学院,安徽合肥230601
出 处:《微电子学与计算机》2025年第2期68-76,共9页Microelectronics & Computer
基 金:安徽省自然科学基金(JZ2024AKZR0571)。
摘 要:无监督视频异常检测,关注从只有视频级标签的视频中,检测出异常事件发生的视频帧。由于没有视频帧标签,会造成该视频中的正常视频帧和异常视频帧难以区分。为了对正常和异常视频帧进行外观和外观特征分析,本文提出一种用于无监督视频异常检测的时间-外观扩散Transformer。该模型中,Transformer编码器用于提取视频帧特征。时间能量扩散模块,使用高斯噪声对时间特征进行扩散,生成加噪后的时间特征集合。该模块使用单步的Monte Carlo采样方法选择出加噪样本,根据加噪样本和原始样本的余弦相似度和均方误差,判断加噪样本是否可信。该模块进一步设计了单次迭代的加噪和多次跨步的采样过程,来充分学习样本特征的更为复杂的时间变化情况。外观能量扩散模块,对外观特征进行单次迭代的加噪和多次跨步的采样过程,学习复杂的外观特征变化。上述的时间、外观能量扩散模块,描述视频帧可信的时间-外观特征,具有较好的互补性,能够有效增强正常和异常样本的区分能力。Transformer解码器用于异常分数的预测。在CUHK Avenue、ShanghaiTech、UCF-Crime和UBnormal这4个数据集上的实验表明,时间-外观扩散Transformer模型优于现有的无监督视频异常检测方法。Unsupervised video anomaly detection focuses on detecting video frames in which abnormal events occur from videos with only video-level tags.Due to the lack of video frame labels,it can be difficult to distinguish between normal and abnormal video frames in the video.To analyze the appearance and appearance features of normal and abnormal video frames,this paper proposes a time appearance diffusion Transformer for unsupervised video anomaly detection.In this model,the Transformer encoder is used to extract video frame features.The time energy diffusion module uses Gaussian noise to diffuse time features and generate a noisy set of time features.This module uses a one-step Monte Carlo sampling method to select noisy samples.This module determines whether the noisy sample is trustworthy based on the cosine similarity and mean square error between the noisy sample and the original sample.This module further designs a single iteration denoising and multiple-step sampling process to fully learn the more complex changes in sample features.The appearance energy diffusion module performs a single iteration of denoising and a multi-step sampling process on the appearance features,learning complex changes in appearance features.The time-appearance energy diffusion module mentioned above describes the trustworthy time-appearance features of video frames,which have good complementarity and can effectively enhance the ability to distinguish between normal and abnormal samples.The Transformer decoder is used for predicting abnormal scores.Experiments on four datasets(CUHK Avenue,ShanghaiTech,UCF Crime and UBnormal)have shown our time appearance diffusion Transformer model is superior to existing unsupervised video anomaly detection methods.
关 键 词:无监督视频异常检测 扩散模型 TRANSFORMER
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.224.69