机构地区:[1]甘肃政法大学,兰州730070
出 处:《刑事技术》2022年第5期448-457,共10页Forensic Science and Technology
基 金:甘肃省自然科学基金(20JR10RA334、21JR7RA570);2021年陇原青年创新创业人才项目(2021LQGR20);甘肃政法大学校级创新项目(GZF2020XZD18、jbzxyb2018-01)
摘 要:随着平安城市项目的不断推进,我国大部分城市已经实现监控全覆盖,并且每天产生海量的监控视频,利用人工智能的方式实现监控视频的自动化处理是目前待解决的问题。针对上述问题,本文提出一种基于C3D和CBAM-ConvLSTM(convolutional block attention module-convolutional long short-term memory network)的视频场景分类算法,对监控中的犯罪事件进行有效分类。首先,使用C3D网络和注意力机制提取监控视频的局部空间特征和局部时间特征;然后,将提取的视频特征序列输入到CBAM-ConvLSTM中提取视频的全局空间特征及全局时间特征;最后,根据全局特征使用分类器对输入视频进行犯罪事件分类。实验在自建的犯罪事件数据集Crimes-mini和公开的暴力行为数据集Hockey两个数据集上进行验证,犯罪事件分类的准确率可达92.19%、F1值可达90.40%;暴力行为分类的准确率可达99.5%、F1值可达99.5%。测试结果表明,论文提出的方法能够较有效地对监控视频中的犯罪事件、暴力行为进行分类。Electronic surveillance has presently covered almost all areas in China’s most cities and produced enormous quantity of videos every day since Ping’an(meaning safety)project has been continuously being extended and promoted nationwide.Such the surveillance videos are important social security resources which await inspection and processing that is yet an obvious burden for human manual operation.Therefore,if the surveillance videos can be classifi ed to discard the redundant video data and make those diffi cult video data easy to access,the task of inspection and processing would be comparatively welcoming and interesting.Artifi cial intelligence(AI)is capable of having the surveillance videos processed automatically.Indeed,there are algorithms designed for classification into natural,urban and indoor scenes.Accordingly,AI is worth adopting to classify the surveillance video scenes and further screen out those involving with crime events that public security police are to solve.Hence,a classifi cation algorithm was here proposed about surveillance video scenes based on C3D(3D convolutional neural network)and CBAM-ConvLSTM(Convolutional Block Attention Module-Convolutional Long Short-Term Memory Network),purposing to effectively seek out crime events from the surveillance videos.Firstly,C3D was used to extract the surveillance videos to cull the local spatio-temporal features which to further highlight those more important through combination of the 3-dimensional spatiotemporal/channel attention mechanisms.Secondly,the extracted video features were sequentially input to the CBAM-ConvLSTM to pick up those global spatial/temporal features.Finally,a classifi er was chosen to classify the input videos according to the global features.The method was tested and validated into the self-built crime event dataset:Crimes-mini and the public violence dataset:Hockey,showing the accuracy at Crimes-mini reaching to 92.19%with the related F1 value as 90.40%and that at Hockey to 99.5%with the F1 as 99.5%.The results demonstra
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...