面向风格多样化的多对多语音情感转换  

Diverse style oriented many-to-many emotional voice conversion

在线阅读下载全文

作  者:周健[1] 罗翔宇 王华彬[1] 郑文明[2] 陶亮[1] ZHOU Jian;LUO Xiangyu;WANG Huabin;ZHENG Wenming;TAO Liang(Key Laboratory of Intelligent Computing and Signal Processing,Anhui University,Hefei 230601;Key Laboratory of Child Development and Learning Science of Ministry of Education,Southeast University,Nanjing 210096)

机构地区:[1]安徽大学计算智能与信号处理教育部重点实验室,合肥230601 [2]东南大学儿童发展与学习科学教育部重点实验室,南京210096

出  处:《声学学报》2024年第6期1297-1303,共7页Acta Acustica

基  金:国家自然科学基金项目(U2003207,61902064)资助。

摘  要:针对现有基于生成对抗网络的语音情感转换仍然存在情感分离不明显,且转换后的语音情感缺乏多样性问题,提出了一种面向风格多样化的多对多语音情感转换方法。该方法基于一个双生成器结构的生成对抗网络模型,通过对不同生成器的中间编码进行一致性损失约束确保语音内容和说话人特征具有一致性,以提升转换后语音情感与目标情感的相似性。此外,该方法通过情感映射网络和情感特征编码器为生成器提供同类情感的多样化情感表征。实验结果表明,所提情感语音转换方法得到的语音情感更接近目标情感,且情感样式更加丰富。To address the issues of insufficient emotional separation and lack of diversity in emotional expression in existing generative adversarial network(GAN)-based emotional voice conversion methods,this paper proposes a many-to-many speech emotional voice conversion method aimed at style diversification.The method is based on a GAN model with a dual-generator structure,where a consistency loss is applied to the latent representations of different generators to ensure the consistency of speech content and speaker characteristics,thereby improving the similarity between the converted speech emotion and the target emotion.Additionally,this method utilizes an emotion mapping network and emotion feature encoder to provide diversified emotional representations of the same emotion category for the generators.Experimental results show that the proposed emotion conversion method yields speech emotions that are closer to the target emotion,with a richer variety of emotional styles.

关 键 词:情感语音转换 风格多样化 生成对抗网络 情感编码 

分 类 号:TN912.3[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象