基于多领域条件生成的语音情感转换  被引量:1

Emotional Voice Conversion Based on Multiple Domain Conditional Generation

在线阅读下载全文

作  者:姚文翰 柯登峰 黄良杰 胡睿欣 项敏特 张劲松[1] YAO Wenhan;KE Dengfeng;HUANG Liangjie;HU Ruixin;XIANG Minte;ZHANG Jinsong(Department of Information Science,Beijing Language and Culture University,Beijing 100089,China)

机构地区:[1]北京语言大学信息科学学院,北京100089

出  处:《郑州大学学报(理学版)》2023年第5期67-72,共6页Journal of Zhengzhou University:Natural Science Edition

基  金:汉考国际科研基金项目(HT-202011-374)。

摘  要:语音情感转换是在不改变话者声纹、语义的情况下,将一种情感语音转换成另一种情感语音的技术,本质是实现语音的风格迁移。主流的风格迁移技术有对抗生成技术(如CycleGAN,StarGAN)和实例规一化技术(如IN,CIN)。CIN相对于IN添加了均值方差选择性模块,具有更强的风格迁移能力。提出了将StarGAN和CIN结合的语音情感转换模型CIN-StarGAN,将CIN模块嵌入到StarGAN生成器。在ESD数据集上的实验结果表明,CINStarGAN比基于CycleGAN的情感转换模型收敛速度快28%,具有较好的风格转换能力。在多领域情感转换方法上具有潜在研究价值。Emotional voice conversion was a technology that converted the emotion of a speech into another without changing the speaker′s timbre and semantics.Its essence was to transfer style of speech.The mainstream style transfer technologies included generative adversarial network(such as CycleGAN,Star-GAN)and instance normalization technology(such as IN,CIN).Compared with IN,CIN added a mean variance selective module,which had stronger style transfer ability.StarGAN and CIN were combined,and proposed a new speech emotion conversion model,CIN-StarGAN.The model embeded the CIN module into the StarGAN generator.The experimental results on ESD data sets showed that CIN-StarGAN converged 28%faster than CycleGAN based emotion conversion model,and had better style transfer ability.It had potential research value in multi domain emotion transfer methods.

关 键 词:语音情感转换 域转换 条件实例归一化 生成对抗网络 

分 类 号:TN912.3[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象