基于辅助学习的改进端到端合成语音检测方法  

Improved End-to-end Synthetic Speech Detection Method Based on Auxiliary Learning

在线阅读下载全文

作  者:袁甜甜 李志华[1] 邱阳 YUAN Tian-tian;LI Zhi-hua;QIU Yang(College of Energy and Electrical Engineering,Hohai University,Nanjing 211100,China)

机构地区:[1]河海大学能源与电气学院,江苏南京211100

出  处:《计算机与现代化》2023年第5期52-57,67,共7页Computer and Modernization

基  金:江苏省自然科学基金资助项目(BK20151500)。

摘  要:随着深度伪造技术的发展,合成语音检测面临越来越多的挑战。本文提出一种将辅助学习融入端到端模型的合成语音检测方法。将音频数据进行数据对齐后在不加提取任何手工特征的情况下直接输入到改进端到端模型,主任务进行真实语音与合成语音的二分类,同时选用不同合成语音类型判别作为辅助任务,为主任务的合成语音检测提供先验假设,并且对主辅任务的权重叠加进行了优化。通过在公开数据集ASVspoof2019及ASVspoof2015上进行的实验结果表明,本文改进的模型与使用手工特征的模型相比能有效降低等错率,且优于改进前的端到端模型,并且在面对未知攻击类型时拥有更好的泛化能力。With the development of deep forgery technology,synthetic speech detection faces more and more challenges,a syn⁃thetic speech detection method is proposed,which integrates auxiliary learning into end-to-end model.After data alignment,the audio data is directly input to the improved end-to-end model without extracting any manual features.The main task is to classify real speech and synthetic speech.At the same time,different synthetic speech types are selected as auxiliary tasks to provide a priori hypothesis for the combined speech detection of the main task,and the weight superposition of the main and auxiliary tasks is optimized.The experimental results on the open datasets ASVspoof2019 and ASVspoof2015 show that the improved model in this paper can effectively reduce the equal error rate compared with the model using manual features,and is better than the endto-end model before the improvement,and has better generalization ability in the face of unknown attack types.

关 键 词:深度伪造 合成语音检测 辅助学习 权重优化 端到端系统 

分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象