融合生成对抗网络与时间卷积网络的普通话情感识别  

Fusing generative adversarial network and temporal convolutional network for Mandarin emotion recognition

在线阅读下载全文

作  者:李海烽 张雪英[1] 段淑斐[1] 贾海蓉[1] Huizhi Liang LI Hai-feng;ZHANG Xue-ying;DUAN Shu-fei;JIA Hai-rong;LIANG Hui-zhi(College of Electronic Information and Optical Engineering,Taiyuan University of Technology,Taiyuan 030024,China;School of Computing,Newcastle University,Newcastle upon Tyne NE17RU,United Kingdom)

机构地区:[1]太原理工大学电子信息与光学工程学院,山西太原030024 [2]纽卡斯尔大学计算机学院,泰恩-威尔泰恩河畔纽卡斯尔NE17RU

出  处:《浙江大学学报(工学版)》2023年第9期1865-1875,共11页Journal of Zhejiang University:Engineering Science

基  金:国家自然科学基金资助项目(12004275);山西省研究生创新项目(2022Y235);山西省留学人员科技活动择优资助项目(20200017);山西省回国留学人员科研资助项目(2019025,2020042);太原理工大学引进人才科研启动基金资助项目(tyutrc201405b);山西省应用基础研究计划面上自然基金资助项目(20210302123186)。

摘  要:为了探究声学与发音学转换对普通话情感识别的影响,提出融合声学与发音特征转换的情感识别系统.根据人体发音机制,录制普通话多模态音视频情感数据库.设计双向映射生成对抗网络(Bi-MGAN)来解决双模态间的特征转换问题,定义生成器损失函数和映射损失函数来优化网络.搭建基于特征-维度注意力机制的残差时间卷积网络(ResTCN-FDA),利用注意力机制自适应地为不同种类特征和不同维度通道赋予不同的权重.实验结果表明,Bi-MGAN在正向和反向映射任务中的转换精度均优于主流的转换网络算法;ResTCN-FDA在给定情感数据集上的评价指标远高于传统的情感识别算法;真实特征融合映射特征使得情感被正确识别的准确率显著提升,证明了映射对普通话情感识别的积极作用.An emotion recognition system that integrates acoustic and articulatory feature conversions was proposed in order to investigate the influence of acoustic and articulatory conversions on Mandarin emotion recognition.Firstly,a multimodal emotional Mandarin database was recorded based on the human articulation mechanism.Then,a bi-directional mapping generative adversarial network(Bi-MGAN)was designed to solve the feature conversion problem with bimodality,and the generator loss functions and the mapping loss functions were proposed to optimise the network.Finally,a residual temporal convolutional network based on the feature-dimension attention(ResTCN-FDA)was constructed to use attention mechanisms to adaptively assign different weights to different variety features and different dimension channels.Experimental results show that the conversion accuracy of Bi-MGAN outperforms the current optimal algorithms for conversion network in both the forward and the reverse mapping tasks.The evaluation metrics of ResTCN-FDA on a given emotion dataset is much higher than traditional emotion recognition algorithms.The real features fused with the mapped features resulted in a significant increase in the accuracy of the emotions being recognized correctly,and the positive effect of mapping on Mandarin emotion recognition was demonstrated.

关 键 词:循环生成对抗网络 情感识别 声学与发音学转换 时间卷积网络 注意力机制 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程] TP391[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象