检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李良琦 张雪英[1] 段淑斐[1] 肖仲喆[2] 贾海蓉[1] 梁慧芝 LI Liangqi;ZHANG Xueying;DUAN Shufei;XIAO Zhongzhe;JIA Hairong;LIANG Huizhi(College of Information and Computer,Taiyuan University of Technology,Taiyuan,Shanxi 030024,China;College of Optoelectronic Information Science and Engineering,Suzhou University,Suzhou,Jiangsu 215006,China;School of Computing,Newcastle University,Newcastle NE17RU,United Kingdom)
机构地区:[1]太原理工大学电子信息与光学工程学院,山西太原030024 [2]苏州大学光电信息科学与工程学院,江苏苏州215006 [3]纽卡斯尔大学计算机学院,英国纽卡斯尔NE17RU
出 处:《复旦学报(自然科学版)》2024年第1期18-31,共14页Journal of Fudan University:Natural Science
基 金:国家自然科学基金青年科学基金(12004275);山西省应用基础研究计划面上自然基金(20210302123186);山西省留学人员科技活动择优资助项目(20200017);太原理工大学引进人才科研启动基金(tyut-rc201405b)。
摘 要:本文设计并建立了一个包含发音运动学、声学、声门和面部微表情的多模态情感语音汉语普通话数据库,分别从语料设计、被试选择、录制细节和数据处理等环节进行了详细的描述,其中信号被标记为离散情感标签(中性、愉悦、高兴、冷漠、愤怒、忧伤、悲痛)和维度情感标签(愉悦度、激活度、优势度)。本文对维度标注的数据进行统计学分析,验证标注的有效性,同时验证标注者的SCL-90量表数据并与PAD标注数据结合后进行分析,探究标注中存在的离群现象与标注者心理状况之间的内在联系。为验证该数据库的语音质量和情感区分度,本文使用SVM、CNN、DNN3种基础模型计算了7种情感的识别率。结果显示,单独使用声学数据时7种情感的平均识别率达到了82.56%;单独使用声门数据时平均识别率达到了72.51%;单独使用运动学数据时平均识别率也达到了55.67%。因此,该数据库具有较高的质量,能够作为语音分析研究的重要来源,尤其是多模态情感语音分析的任务。This paper designs and establishes a multimodal emotional speech Mandarin Chinese database including pronunciation kinematics,acoustics,glottis and facial micro-expressions,which is described in detail from the aspects of corpus design,participant selection,recording details and data processing,in which signals are marked as discrete emotional labels(neutral,pleasant,happy,apathetic,angry,sad,grief)and dimensional emotional labels(pleasure,activation,dominance).In this paper,the data labeled by dimension are statistically analyzed to verify the effectiveness of the annotation,and the outliers in the annotation are analyzed by combining the SCL-90 scale,and the SCL-90 scale data of the annotator is verified and analyzed in combination with the PAD annotated data,so as to explore the intrinsic relationship between the outlier phenomenon in the annotation and the psychological condition of the labeler.In order to verify the speech quality and emotion discrimination of the database,this paper uses three basic classification models of Support Vector Machine(SVM),Deep Neural Networks(DNN),Convolutional Neural Networks(CNN),to calculate the emotion recognition rate of these seven emotions categories.The results show that the average recognition rate of all seven emotions when using acoustic data alone reached 82.56%;the average recognition rate when using glottis data alone reached 72.51%;the average recognition rate when using the kinematics data also reached of 55.67%.Therefore,the database has high quality and can serve as an important source for the speech analysis research community,especially the task of multimodal emotional speech analysis.
关 键 词:情感语音数据库 多模态情感识别 维度情感空间 三维电磁发音仪 电子声门仪
分 类 号:TP392[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.148.221.222