端到端增强卷积网络的视频人脸表情识别研究

Research on Video Expression Recognition Based on End-to-End Enhanced Feature Neural Convolution Network

作　　者：唐武宾童莹[2] 曹雪虹[2] TANG Wu-bin;TONG Ying;CAO Xue-hong(College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Information and Communication Engineering,Nanjing Institute of Technology,Nanjing 211167,China)

机构地区：[1]南京邮电大学通信与信息工程学院,江苏南京210003 [2]南京工程学院信息与通信工程学院,江苏南京211167

出　　处：《软件导刊》2022年第3期42-48,共7页Software Guide

基　　金：国家自然科学基金项目(61703201);江苏省自然科学基金项目(BK20170765)。

摘　　要：视频人脸表情识别在无人驾驶、智慧医疗等多领域都有广泛应用。针对视频单帧特征提取存在信息损失的问题,提出单帧增强卷积网络,该网络采用浅层特征与深层特征融合实现特征增强,其中浅层特征为CNN中间层外延卷积模块实现浅层特征提取,深层特征为CNN网络最后一层融合空洞卷积和基于通道间注意力机制,实现特征通道重定位和强弱信息结合。又由于视频相邻帧存在相关性,提出多帧增强卷积网络,其采用帧间注意力机制,根据视频帧之间的相关性给视频帧打分从而得到视频的关键帧,继而实现多帧特征增强。最后对设计的模型在AFEW数据集、CK+数据集、SFEW数据集、FER数据集上进行验证,其中AFEW数据集上对视频表情识别的准确率从40.00%提升到45.19%,F1分数从0.31提升到0.3937。该网络模型不仅能应用于静态图像,而且能应用于动态视频,同时也能提高表情识别准确率,降低误差,从而提升识别效率。Video facial expression recognition is widely used in driverless technology,intelligent medical treatment and other fields.Aiming at the problem of information loss in single-frame feature extraction of video,a single-frame enhanced convolutional network is proposed,which uses the fusion of shallow features and deep features to achieve feature enhancement.The shallow features are the CNN epitaxial convolution module to achieve shallow feature extraction.The deep feature is the fusion of the dilated convolution and the inter-channel attention mechanism in the CNN network to realize the feature channel relocation and the combination of strong and weak information.Based on the correlation between adjacent frames of video,a multi-frame enhanced convolutional network is proposed,which introduces an attention mechanism to extract key frames.Finally,it was verified on the AFEW dataset,CK+dataset,SFEW dataset,and FER dataset.The accuracy rate on the AFEW dataset was increased from 40.00%to 45.19%,and the F1 score was increased from 0.31 to 0.3937.The network model can be applied not only to static images,but also to dynamic videos.At the same time,it can also improve the accuracy of facial expression recognition,reduce errors,and improve recognition efficiency.

关键词：表情识别单帧增强卷积网络注意力机制多帧增强卷积网络 AFEW数据集

分类号：TP391.4[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

端到端增强卷积网络的视频人脸表情识别研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

端到端增强卷积网络的视频人脸表情识别研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索