检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:唐武宾 童莹[2] 曹雪虹[2] TANG Wu-bin;TONG Ying;CAO Xue-hong(College of Telecommunications&Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210003,China;School of Information and Communication Engineering,Nanjing Institute of Technology,Nanjing 211167,China)
机构地区:[1]南京邮电大学通信与信息工程学院,江苏南京210003 [2]南京工程学院信息与通信工程学院,江苏南京211167
出 处:《软件导刊》2022年第3期42-48,共7页Software Guide
基 金:国家自然科学基金项目(61703201);江苏省自然科学基金项目(BK20170765)。
摘 要:视频人脸表情识别在无人驾驶、智慧医疗等多领域都有广泛应用。针对视频单帧特征提取存在信息损失的问题,提出单帧增强卷积网络,该网络采用浅层特征与深层特征融合实现特征增强,其中浅层特征为CNN中间层外延卷积模块实现浅层特征提取,深层特征为CNN网络最后一层融合空洞卷积和基于通道间注意力机制,实现特征通道重定位和强弱信息结合。又由于视频相邻帧存在相关性,提出多帧增强卷积网络,其采用帧间注意力机制,根据视频帧之间的相关性给视频帧打分从而得到视频的关键帧,继而实现多帧特征增强。最后对设计的模型在AFEW数据集、CK+数据集、SFEW数据集、FER数据集上进行验证,其中AFEW数据集上对视频表情识别的准确率从40.00%提升到45.19%,F1分数从0.31提升到0.3937。该网络模型不仅能应用于静态图像,而且能应用于动态视频,同时也能提高表情识别准确率,降低误差,从而提升识别效率。Video facial expression recognition is widely used in driverless technology,intelligent medical treatment and other fields.Aiming at the problem of information loss in single-frame feature extraction of video,a single-frame enhanced convolutional network is proposed,which uses the fusion of shallow features and deep features to achieve feature enhancement.The shallow features are the CNN epitaxial convolution module to achieve shallow feature extraction.The deep feature is the fusion of the dilated convolution and the inter-channel attention mechanism in the CNN network to realize the feature channel relocation and the combination of strong and weak information.Based on the correlation between adjacent frames of video,a multi-frame enhanced convolutional network is proposed,which introduces an attention mechanism to extract key frames.Finally,it was verified on the AFEW dataset,CK+dataset,SFEW dataset,and FER dataset.The accuracy rate on the AFEW dataset was increased from 40.00%to 45.19%,and the F1 score was increased from 0.31 to 0.3937.The network model can be applied not only to static images,but also to dynamic videos.At the same time,it can also improve the accuracy of facial expression recognition,reduce errors,and improve recognition efficiency.
关 键 词:表情识别 单帧增强卷积网络 注意力机制 多帧增强卷积网络 AFEW数据集
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38