检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张瑷涵 刘翔[1] 石蕴玉[1] 刘思齐 ZHANG Aihan;LIU Xiang;SHI Yunyu;LIU Siqi(School of Electrical and Electronic Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)
机构地区:[1]上海工程技术大学电子电气工程学院,上海201620
出 处:《计算机工程》2022年第7期277-283,共7页Computer Engineering
基 金:文化部科技创新项目(2015KJCXXM19)。
摘 要:随着智能手机和5G网络的普及,短视频已经成为人们碎片时间获取知识的主要途径。针对现实生活场景短视频数据集不足及分类精度较低等问题,提出融合深度学习技术的双流程短视频分类方法。在主流程中,构建A-VGG-3D网络模型,利用带有注意力机制的VGG网络提取特征,采用优化的3D卷积神经网络进行短视频分类,提升短视频在时间维度上的连续性、平衡性和鲁棒性。在辅助流程中,使用帧差法判断镜头切换抽取出短视频中的若干帧,通过滑动窗口机制与级联分类器融合的方式对其进行多尺度人脸检测,进一步提高短视频分类准确性。实验结果表明,该方法在UCF101数据集和自建的生活场景短视频数据集上对于非剧情类与非访谈类短视频的查准率和查全率最高达到98.9%和98.6%,并且相比基于C3D网络的短视频分类方法,在UCF101数据集上的分类准确率提升了9.7个百分点,具有更强的普适性。As the smartphones and 5G networks have become increasingly popular,short videos have become the medium through which people to acquire knowledge in a short time.Inspired by the shortage of short video datasets in real-life scenarios and low accuracy of short video classification,this study proposes a dual-process short video classification method integrating the deep learning technology.In the main process,a A-VGG-3D network model is constructed.Then,a VGG network with an attention mechanism is used to extract features,while the optimized 3D Convolutional Neural Network(3DCNN)is used for short video classification,which can improve the continuity,balance,and robustness of short videos in the temporal dimension.In the auxiliary process,the frame difference method is used to conduct shot switching to extract several frames from the short videos.Then,multi-scale face detection is performed on the extracted frames by integrating the sliding window mechanism and cascade classifier,which can further improve the short video classification accuracy.The experimental results demonstrate that the precision and recall of this method for non-plot and non-interview short videos on the UCF101 dataset and a self-built short video dataset of life scenes are 98.9% and 98.6%,respectively. Compared with the short video classification method based on a C3D network,the classification accuracy of the proposed method on the UCF101 dataset is 9.7 percentage points higher,which signifies that the proposed method more universally accurate.
关 键 词:3D卷积神经网络 深度学习 VGG网络 注意力机制 短视频分类
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.97.243