基于不确定度感知的帧关联短视频事件检测方法  

Uncertainty-based frame associated short video event detection method

在线阅读下载全文

作  者:李云 王富铕 井佩光[3] 王粟 肖澳 LI Yun;WANG Fuyou;JING Peiguang;WANG Su;XIAO Ao(School of Big Data and Artificial Intelligence,Guangxi University of Finance and Economics,Nanning Guangxi 530003,China;Institute of Electrification and Telecommunications,China Railway Design Corporation,Tianjin 300308,China;School of Electrical Automation and Information Engineering,Tianjin University,Tianjin 300072,China;College of Electronic Information,Guangxi Minzu University,Nanning Guangxi 530006,China;School of Computer and Electronic Information,Guangxi University,Nanning Guangxi 530004,China)

机构地区:[1]广西财经学院大数据与人工智能学院,南宁530003 [2]中国铁路设计集团有限公司电化电信院,天津300308 [3]天津大学电气与信息工程学院,天津300072 [4]广西民族大学电子信息学院,南宁530006 [5]广西大学计算机与电子信息学院,南宁530004

出  处:《计算机应用》2024年第9期2903-2910,共8页journal of Computer Applications

基  金:国家自然科学基金资助项目(61861014);博士启动基金资助项目(BS2021025)。

摘  要:针对如何联合短视频的帧不确定度和时序关联性,以增强事件检测能力的问题,提出一种基于不确定度感知的帧关联短视频事件检测方法。首先,利用2D卷积神经网络(CNN)提取短视频每一帧的特征,再将该特征多次前向传播并通过贝叶斯变分层获得特征均值和与特征对应的不确定度信息;其次,利用模型构建的不确定度感知模块将特征均值和不确定度信息进行融合,再将融合后所得的各帧特征通过时序关联模块加强时域上的联系;最后,用时域关联后的特征通过分类网络实现短视频事件检测。在从Flickr平台上爬取到的短视频事件检测数据集上开展实验对比,实验结果表明,支持向量机(SVM)等子空间学习方法的分类性能较差,对高级语义表示的探索不充分;而深度学习方法对于事件检测的准确率明显更优。相较于SViTT(Sparse Video-Text Transformer)方法,所提方法的准确率、平均召回率和平均精度分别提高了3.37%、2.55%和2.09%,验证了所提方法在短视频事件检测任务上的有效性。Aiming at the problem of how to combine the frame uncertainty and temporal correlation of short videos to enhance event detection capability,a frame associated short video event detection method based on uncertainty perception was proposed.Firstly,2D Convolutional Neural Network(CNN)was used to extract the features of each frame of short video,and then the extracted features were forward propagated several times to obtain the feature mean value and the uncertainty information corresponding to the features through Bayesian variational layering.Secondly,the uncertainty perception module constructed by the model was used to fuse the feature mean value and the uncertainty information,and then the correlations in time domain of the fused features of the frames were strengthened by the temporal correlation module.Finally,the time-domain correlated features were used to realize short video event detection through the classification network.The short video event detection dataset crawled from Flickr platform was utilized to carry out experimental comparison,and the results show that subspace learning methods such as Support Vector Machine(SVM)have the poor classification performance and do not explore high-level semantic representations enough,while deep learning methods have significantly better accuracy for event detection.Compared to Sparse Video-Text Transformer(SViTT)method,the proposed method has the accuracy,Average Recall(AR),and Average Precision(AP)improved by 3.37%,2.55%,and 2.09%,respectively,so that the effectiveness of the proposed method for the task of short video event detection is verified.

关 键 词:时序关联性 帧关联短视频事件 卷积神经网络 贝叶斯神经网络 不确定度 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术] TP183[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象