检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李永伟 陶建华 李凯 LI Yongwei;TAO Jianhua;LI Kai(National Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;Department of Automation,Tsinghua University,Beijing 100084,China;Japan Advanced Institute of Science and Technology,Ishikaha 923-1211,Japan)
机构地区:[1]中国科学院自动化研究所模式识别国家重点实验室,北京100190 [2]清华大学自动化系,北京100084 [3]北陆先端科学技术大学院大学,日本石川923-1211
出 处:《信号处理》2023年第4期632-638,共7页Journal of Signal Processing
基 金:国家自然科学基金(62201571,U21B2010)。
摘 要:语音情感识别是实现自然人机交互不可缺失的部分,是人工智能的重要组成部分。发音器官的调控引起情感语音声学特征的差异,从而被感知到不同的情感。传统的语音情感识别只是针对语音信号中的声学特征或听觉特征进行情感分类,忽略了声门波和声道等发音特征对情感感知的重要作用。在我们前期工作中,理论分析了声门波和声道形状对感知情感的重要影响,但未将声门波与声道特征用于语音情感识别。因此,本文从语音生成的角度重新探讨了声门波与声道特征对语音情感识别的可能性,提出一种基于源-滤波器模型的声门波和声道特征语音情感识别方法。首先,利用Liljencrants-Fant和Auto-Regressive eXogenous(ARX-LF)模型从语音信号中分离出情感语音的声门波和声道特征;然后,将分离出的声门波和声道特征送入双向门控循环单元(BiGRU)进行情感识别分类任务。在公开的情感数据集IEMOCAP上进行了情感识别验证,实验结果证明了声门波和声道特征可以有效的区分情感,且情感识别性能优于一些传统特征。本文从发音相关的声门波与声道研究语音情感识别,为语音情感识别技术提供了一种新思路。Speech emotion recognition is an indispensable part of realizing natural human-computer interaction and an important part of artificial intelligence.The regulation of speech production organs causes differences in the acoustic features of the emotional speech signal,and thus different emotions are perceived.Traditional speech emotion recognition methods are only focused on classifying emotions based on acoustical features or auditory features,ignoring the important role of speech production directly related features such as glottal source waveform and vocal tract shape cues on emotion perception.In our previous study,the contributions of glottal source and vocal tract cues to the emotion perception in speech have been theoretically analyzed.However,the glottal source and vocal tract features have not been used for speech emotion recognition.Therefore,in this paper,we revisited the possibility of glottal source and vocal tract cues for speech emotion recognition from the point of view of speech production.Motivated by the source-filter model of speech production,we propose a new speech emotion recognition method based on the glottal source and vocal tract features.Firstly,the glottal source and vocal tract features were estimated simultaneously from emotional speech signals based on an analysis-bysynthesis approach with a source-filter model constructed of an Auto-Regressive eXogenous(ARX)model and the Liljencrants-Fant(LF)model.Then,the estimated glottal source and vocal tract features were fed into the Bidirectional Gated Recurrent Unit(BiGRU)network for the speech emotion recognition tasks.The emotion recognition verification were conducted on an public emotion dataset of interactive emotional dyadic motion capture database(IEMOCAP),and the experimental results showed that the glottal source and vocal tract features could effectively distinguish the emotions,and the emotion recognition accuracy of the glottal source and vocal tract features is superior to that of traditional emotion features.This paper is focus
关 键 词:语音情感特征 声门波与声道 源-滤波器模型 语音情感识别
分 类 号:TP37[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.248