端到端增强卷积网络的视频人脸表情识别研究  

Research on Video Expression Recognition Based on End-to-End Enhanced Feature Neural Convolution Network

在线阅读下载全文

作  者:唐武宾 童莹[2] 曹雪虹[2] TANG Wu-bin;TONG Ying;CAO Xue-hong(College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Information and Communication Engineering,Nanjing Institute of Technology,Nanjing 211167,China)

机构地区:[1]南京邮电大学通信与信息工程学院,江苏南京210003 [2]南京工程学院信息与通信工程学院,江苏南京211167

出  处:《软件导刊》2022年第3期42-48,共7页Software Guide

基  金:国家自然科学基金项目(61703201);江苏省自然科学基金项目(BK20170765)。

摘  要:视频人脸表情识别在无人驾驶、智慧医疗等多领域都有广泛应用。针对视频单帧特征提取存在信息损失的问题,提出单帧增强卷积网络,该网络采用浅层特征与深层特征融合实现特征增强,其中浅层特征为CNN中间层外延卷积模块实现浅层特征提取,深层特征为CNN网络最后一层融合空洞卷积和基于通道间注意力机制,实现特征通道重定位和强弱信息结合。又由于视频相邻帧存在相关性,提出多帧增强卷积网络,其采用帧间注意力机制,根据视频帧之间的相关性给视频帧打分从而得到视频的关键帧,继而实现多帧特征增强。最后对设计的模型在AFEW数据集、CK+数据集、SFEW数据集、FER数据集上进行验证,其中AFEW数据集上对视频表情识别的准确率从40.00%提升到45.19%,F1分数从0.31提升到0.3937。该网络模型不仅能应用于静态图像,而且能应用于动态视频,同时也能提高表情识别准确率,降低误差,从而提升识别效率。Video facial expression recognition is widely used in driverless technology,intelligent medical treatment and other fields.Aiming at the problem of information loss in single-frame feature extraction of video,a single-frame enhanced convolutional network is proposed,which uses the fusion of shallow features and deep features to achieve feature enhancement.The shallow features are the CNN epitaxial convolution module to achieve shallow feature extraction.The deep feature is the fusion of the dilated convolution and the inter-channel attention mechanism in the CNN network to realize the feature channel relocation and the combination of strong and weak information.Based on the correlation between adjacent frames of video,a multi-frame enhanced convolutional network is proposed,which introduces an attention mechanism to extract key frames.Finally,it was verified on the AFEW dataset,CK+dataset,SFEW dataset,and FER dataset.The accuracy rate on the AFEW dataset was increased from 40.00%to 45.19%,and the F1 score was increased from 0.31 to 0.3937.The network model can be applied not only to static images,but also to dynamic videos.At the same time,it can also improve the accuracy of facial expression recognition,reduce errors,and improve recognition efficiency.

关 键 词:表情识别 单帧增强卷积网络 注意力机制 多帧增强卷积网络 AFEW数据集 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象