普通话多模态情感语音数据库构建与评测

Construction and Evaluation of Mandarin Multimodal Emotional Speech Database

作　　者：李良琦张雪英[1] 段淑斐[1] 肖仲喆[2] 贾海蓉[1] 梁慧芝 LI Liangqi;ZHANG Xueying;DUAN Shufei;XIAO Zhongzhe;JIA Hairong;LIANG Huizhi(College of Information and Computer,Taiyuan University of Technology,Taiyuan,Shanxi 030024,China;College of Optoelectronic Information Science and Engineering,Suzhou University,Suzhou,Jiangsu 215006,China;School of Computing,Newcastle University,Newcastle NE17RU,United Kingdom)

机构地区：[1]太原理工大学电子信息与光学工程学院,山西太原030024 [2]苏州大学光电信息科学与工程学院,江苏苏州215006 [3]纽卡斯尔大学计算机学院,英国纽卡斯尔NE17RU

出　　处：《复旦学报（自然科学版）》2024年第1期18-31,共14页Journal of Fudan University：Natural Science

基　　金：国家自然科学基金青年科学基金(12004275);山西省应用基础研究计划面上自然基金(20210302123186);山西省留学人员科技活动择优资助项目(20200017);太原理工大学引进人才科研启动基金(tyut-rc201405b)。

摘　　要：本文设计并建立了一个包含发音运动学、声学、声门和面部微表情的多模态情感语音汉语普通话数据库,分别从语料设计、被试选择、录制细节和数据处理等环节进行了详细的描述,其中信号被标记为离散情感标签(中性、愉悦、高兴、冷漠、愤怒、忧伤、悲痛)和维度情感标签(愉悦度、激活度、优势度)。本文对维度标注的数据进行统计学分析,验证标注的有效性,同时验证标注者的SCL-90量表数据并与PAD标注数据结合后进行分析,探究标注中存在的离群现象与标注者心理状况之间的内在联系。为验证该数据库的语音质量和情感区分度,本文使用SVM、CNN、DNN3种基础模型计算了7种情感的识别率。结果显示,单独使用声学数据时7种情感的平均识别率达到了82.56%;单独使用声门数据时平均识别率达到了72.51%;单独使用运动学数据时平均识别率也达到了55.67%。因此,该数据库具有较高的质量,能够作为语音分析研究的重要来源,尤其是多模态情感语音分析的任务。This paper designs and establishes a multimodal emotional speech Mandarin Chinese database including pronunciation kinematics,acoustics,glottis and facial micro-expressions,which is described in detail from the aspects of corpus design,participant selection,recording details and data processing,in which signals are marked as discrete emotional labels(neutral,pleasant,happy,apathetic,angry,sad,grief)and dimensional emotional labels(pleasure,activation,dominance).In this paper,the data labeled by dimension are statistically analyzed to verify the effectiveness of the annotation,and the outliers in the annotation are analyzed by combining the SCL-90 scale,and the SCL-90 scale data of the annotator is verified and analyzed in combination with the PAD annotated data,so as to explore the intrinsic relationship between the outlier phenomenon in the annotation and the psychological condition of the labeler.In order to verify the speech quality and emotion discrimination of the database,this paper uses three basic classification models of Support Vector Machine(SVM),Deep Neural Networks(DNN),Convolutional Neural Networks(CNN),to calculate the emotion recognition rate of these seven emotions categories.The results show that the average recognition rate of all seven emotions when using acoustic data alone reached 82.56%;the average recognition rate when using glottis data alone reached 72.51%;the average recognition rate when using the kinematics data also reached of 55.67%.Therefore,the database has high quality and can serve as an important source for the speech analysis research community,especially the task of multimodal emotional speech analysis.

关键词：情感语音数据库多模态情感识别维度情感空间三维电磁发音仪电子声门仪

分类号：TP392[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

普通话多模态情感语音数据库构建与评测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

普通话多模态情感语音数据库构建与评测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索