基于深度学习的双流程短视频分类方法被引量：3

Dual-Process Short Video Classification Method Based on Deep Learning

作　　者：张瑷涵刘翔[1] 石蕴玉[1] 刘思齐 ZHANG Aihan;LIU Xiang;SHI Yunyu;LIU Siqi(School of Electrical and Electronic Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)

机构地区：[1]上海工程技术大学电子电气工程学院,上海201620

出　　处：《计算机工程》2022年第7期277-283,共7页Computer Engineering

基　　金：文化部科技创新项目(2015KJCXXM19)。

摘　　要：随着智能手机和5G网络的普及,短视频已经成为人们碎片时间获取知识的主要途径。针对现实生活场景短视频数据集不足及分类精度较低等问题,提出融合深度学习技术的双流程短视频分类方法。在主流程中,构建A-VGG-3D网络模型,利用带有注意力机制的VGG网络提取特征,采用优化的3D卷积神经网络进行短视频分类,提升短视频在时间维度上的连续性、平衡性和鲁棒性。在辅助流程中,使用帧差法判断镜头切换抽取出短视频中的若干帧,通过滑动窗口机制与级联分类器融合的方式对其进行多尺度人脸检测,进一步提高短视频分类准确性。实验结果表明,该方法在UCF101数据集和自建的生活场景短视频数据集上对于非剧情类与非访谈类短视频的查准率和查全率最高达到98.9%和98.6%,并且相比基于C3D网络的短视频分类方法,在UCF101数据集上的分类准确率提升了9.7个百分点,具有更强的普适性。As the smartphones and 5G networks have become increasingly popular,short videos have become the medium through which people to acquire knowledge in a short time.Inspired by the shortage of short video datasets in real-life scenarios and low accuracy of short video classification,this study proposes a dual-process short video classification method integrating the deep learning technology.In the main process,a A-VGG-3D network model is constructed.Then,a VGG network with an attention mechanism is used to extract features,while the optimized 3D Convolutional Neural Network(3DCNN)is used for short video classification,which can improve the continuity,balance,and robustness of short videos in the temporal dimension.In the auxiliary process,the frame difference method is used to conduct shot switching to extract several frames from the short videos.Then,multi-scale face detection is performed on the extracted frames by integrating the sliding window mechanism and cascade classifier,which can further improve the short video classification accuracy.The experimental results demonstrate that the precision and recall of this method for non-plot and non-interview short videos on the UCF101 dataset and a self-built short video dataset of life scenes are 98.9% and 98.6%,respectively. Compared with the short video classification method based on a C3D network,the classification accuracy of the proposed method on the UCF101 dataset is 9.7 percentage points higher,which signifies that the proposed method more universally accurate.

关键词：3D卷积神经网络深度学习 VGG网络注意力机制短视频分类

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度学习的双流程短视频分类方法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于深度学习的双流程短视频分类方法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于深度学习的双流程短视频分类方法被引量：3