检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨曼 简志华[1] 梁承涵 YANG Man;JIAN Zhihua;LIANG Chenghan(School of Communication Engineering,Hangzhou Dianzi University,Hangzhou 310018,China)
机构地区:[1]杭州电子科技大学通信工程学院,浙江杭州310018
出 处:《电信科学》2024年第11期40-49,共10页Telecommunications Science
基 金:国家自然科学基金资助项目(No.61201301,No.61772166)。
摘 要:为了消除训练数据集中真实语音和伪造语音的样本数量不平衡对合成伪造语音检测系统性能的影响,并进一步提高系统的检测准确率,提出了一种基于自监督对比学习的合成语音检测方法。所提方法将经过音高变换后的样本视为负样本,通过训练神经网络使锚点样本特征与负样本特征不同,从而促使网络提取对于音高变换敏感的特征,再采用深度残差网络作为后端分类器来判决语音真伪。实验结果表明,与传统手工设计的声学特征方法、基于深度学习的伪造语音检测系统以及基于端到端的伪造语音检测系统相比,所提方法显著降低了系统的等错误率。由于自监督对比学习的合成伪造语音检测方法可以训练网络提取对音高变换敏感的特征,并且不受数据集中真伪语音数量不平衡的影响,因此显著提高了合成伪造语音检测的准确率。In order to eliminate the impact of the imbalance of the sample size of bonafide speech and fake speech in the training dataset on the performance of synthetic speech detection system and further improve the accuracy of syn‐thetic speech detection,a method of synthetic speech detection was proposed based on self-supervised contrastive learning.In this method,the samples after pitch transformation were regarded as negative samples,and the neural net‐work was trained to make the anchor sample features different from the negative sample features,so that the network could extract the features sensitive to pitch transformation.And the deep residual network was used as the back-end classifier to judge the authenticity of the speech.Experimental results show that,compared with the traditional hand-crafted acoustic features,the deep learning-based and the end-to-end spoofing speech detection systems,the proposed method significantly reduces the equal error rate of the system.The synthetic forged speech detection method based on self-supervised contrastive learning can train the network to extract features sensitive to pitch transformation and will not affect the accuracy of synthetic speech detection because of the imbalance of bonafide and fake speech in the dataset,so the accuracy of synthetic forged speech detection is significantly improved.
关 键 词:伪造语音检测 合成语音检测 自监督对比学习 深度残差网络 音高变换
分 类 号:TN912[电子电信—通信与信息系统]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90