检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陶子钰 苏兆品[1,3,4] 廉晨思 王年松 张国富[1,3,4] TAO Ziyu;SU Zhaopin;LIAN Chensi;WANG Niansong;ZHANG Guofu(School of Computer and Information Technology,Hefei University of Technology,Hefei 230009,Anhui,China;Department of Physical Evidence Identification,Anhui Public Security Department,Hefei 230000,Anhui,China;Intelligent Interconnected Systems Laboratory of Anhui Province,Hefei University of Technology,Hefei 230009,Anhui,China;Joint Laboratory of Intelligent Prevention and Recognition of Audio and Video,Hefei 230009,Anhui,China)
机构地区:[1]合肥工业大学计算机与信息学院,安徽合肥230009 [2]安徽省公安厅物证鉴定管理处,安徽合肥230000 [3]合肥工业大学智能互联系统安徽省实验室,安徽合肥230009 [4]音视频智能防识联合实验室,安徽合肥230009
出 处:《应用科学学报》2024年第5期782-794,共13页Journal of Applied Sciences
基 金:安徽省重点研究与开发计划(No.202104d07020001);安徽省自然科学基金(No.2208085MF166)资助。
摘 要:目前主流说话人识别(speaker identification,SID)系统的攻击方法主要基于快速梯度下降或映射式梯度下降算法,这些方法存在攻击效果不稳定、生成的攻击语音听觉质量不高等问题。为此提出一种基于深度声纹特征转换网络的自动说话人识别攻击方法,生成具有目标说话人音色的攻击语音。首先分析了SID系统的攻击流程,确定了攻击语音生成的过程;然后基于二维卷积神经网络设计攻击音频生成器,以有效融合源说话人的语音内容和目标说话人的声纹特征,并基于对抗学习设计了攻击音频的判别器,以提高语音攻击音频的质量。最后分别在基于广义端到端损失和基于AMSoftmax损失的两个自动说话人识别系统上进行对比实验。实验结果表明,所提方法不但提高了攻击效果的稳定性,提升了攻击音频的人耳感受质量,而且适用于短时长数据,满足了实际攻击场景的需求。In the field of speaker identification(SID)systems,attacks often rely on fast gradient descent and mapping gradient descent algorithms,which suffer from unstable attack performance and poor auditory quality of generated attack samples.This paper proposes an advanced attack method against SID systems using deep neural networks to generate attack speeches with the target speaker’s voiceprint.Specifically,the attack process on SID system is first analyzed to determine the approach to generating attack speeches.Then,a two-dimensional convolutional neural network is designed as a generator to effectively integrate the speech content of the source speaker and the voiceprint features of the target speaker.A discriminator is designed based on adversarial learning to improve the quality of the attack speeches.Finally,comparative experiments are conducted on two automatic SID systems based on generalized end-to-end loss and AMSoftmax loss,respectively.Experimental results demonstrate that the proposed method not only improves the stability of attack performance,but also enhances the auditory quality of attack speeches.Moreover,the proposed method is applicable to short samples,making it suitable for practical attack scenarios.
关 键 词:说话人识别 攻击语音 声纹特征转换 卷积神经网络
分 类 号:TP389.1[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38