采用自监督对比学习的合成伪造语音检测方法

A method of synthetic spoofing speech detection using selfsupervised contrastive learning

作　　者：杨曼简志华[1] 梁承涵 YANG Man;JIAN Zhihua;LIANG Chenghan(School of Communication Engineering,Hangzhou Dianzi University,Hangzhou 310018,China)

机构地区：[1]杭州电子科技大学通信工程学院,浙江杭州310018

出　　处：《电信科学》2024年第11期40-49,共10页Telecommunications Science

基　　金：国家自然科学基金资助项目(No.61201301,No.61772166)。

摘　　要：为了消除训练数据集中真实语音和伪造语音的样本数量不平衡对合成伪造语音检测系统性能的影响,并进一步提高系统的检测准确率,提出了一种基于自监督对比学习的合成语音检测方法。所提方法将经过音高变换后的样本视为负样本,通过训练神经网络使锚点样本特征与负样本特征不同,从而促使网络提取对于音高变换敏感的特征,再采用深度残差网络作为后端分类器来判决语音真伪。实验结果表明,与传统手工设计的声学特征方法、基于深度学习的伪造语音检测系统以及基于端到端的伪造语音检测系统相比,所提方法显著降低了系统的等错误率。由于自监督对比学习的合成伪造语音检测方法可以训练网络提取对音高变换敏感的特征,并且不受数据集中真伪语音数量不平衡的影响,因此显著提高了合成伪造语音检测的准确率。In order to eliminate the impact of the imbalance of the sample size of bonafide speech and fake speech in the training dataset on the performance of synthetic speech detection system and further improve the accuracy of syn‐thetic speech detection,a method of synthetic speech detection was proposed based on self-supervised contrastive learning.In this method,the samples after pitch transformation were regarded as negative samples,and the neural net‐work was trained to make the anchor sample features different from the negative sample features,so that the network could extract the features sensitive to pitch transformation.And the deep residual network was used as the back-end classifier to judge the authenticity of the speech.Experimental results show that,compared with the traditional hand-crafted acoustic features,the deep learning-based and the end-to-end spoofing speech detection systems,the proposed method significantly reduces the equal error rate of the system.The synthetic forged speech detection method based on self-supervised contrastive learning can train the network to extract features sensitive to pitch transformation and will not affect the accuracy of synthetic speech detection because of the imbalance of bonafide and fake speech in the dataset,so the accuracy of synthetic forged speech detection is significantly improved.

关键词：伪造语音检测合成语音检测自监督对比学习深度残差网络音高变换

分类号：TN912[电子电信—通信与信息系统]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

采用自监督对比学习的合成伪造语音检测方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

采用自监督对比学习的合成伪造语音检测方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索