孪生Bi-LSTM模型在语音欺骗检测中的研究  被引量:3

Research on Siamese Bi-LSTM Model in Speech Spoofing Detection

在线阅读下载全文

作  者:甘海林 雷震春[1] 杨印根[1] GAN Hai-lin;LEI Zhen-chun;YANG Yin-gen(School of Computer Information Engineering,Jiangxi Normal University,Nanchang 330022,China)

机构地区:[1]江西师范大学计算机信息工程学院,南昌330022

出  处:《小型微型计算机系统》2022年第6期1265-1271,共7页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61662030,62067004)资助;江西省教育厅科学技术研究项目(170205)资助.

摘  要:在语音欺骗检测中,高斯混合模型(Gaussian Mixture Model,GMM)独立地累计所有语音帧的分数,而忽略了每个高斯分量对最终分数的贡献.本文对每个高斯混合模型分量上的分数进行建模,并基于线性频率倒谱系数(Linear Frequency Cepstral Coefficients,LFCC)构建高斯概率特征(Gaussian Probability Features,GPF);结合能够捕捉语音帧的前后依赖关系的双向LSTM和具有强大分类能力的孪生网络,使用孪生双向LSTM(Siamese Bidirectional Long Short-Term Memory,SBi-LSTM)模型进行语音欺骗检测.SBi-LSTM模型进行语音欺骗检测时,首先在真实和欺骗语音数据集上训练得到两个GMM,然后利用GMM计算每条语音的GPF,最后对输入的GPF进行二分类.实验在ASVspoof 2019数据集上进行,实验结果表明SBi-LSTM模型明显优于GMM,逻辑访问场景下min t-DCF和EER分别比GMM的min t-DCF和EER降低了47.62%和48.35%,物理访问场景下分别降低了31.03%和39.69%.SBi-LSTM模型和GMM得分融合后性能有进一步提高,逻辑访问场景下min t-DCF和EER分别比GMM的min t-DCF和EER降低了71.43%和70.62%,物理访问场景下分别降低了34.48%和45.74%.During the process of the speech spoofing detection,the GMM accumulates the scores of all speech frames independently,while ignoring the contribution of each Gaussian component to the final score.In this paper,the score on each Gaussian mixture model component is modeled,and Gaussian probability features(GPF)are constructed based on Linear Frequency Cepstral Coefficients(LFCC).Combining a bidirectional LSTM to capture the front-back dependence of speech frames with the powerful classification ability of siamese networks,this paper uses the Siamese Bidirectional Long Short-Term Memory(SBLSTM)model to investigate speech spoofing detection.When the SBi-LSTM model performs speech spoofing detection,two GMMs are obtained by training on real and spoofing speech firstly,then uses the GMMs to calculate Gaussian probability features for each speech,and finally classifies the input GPF.The experiments are runed on the ASVspoof 2019 dataset,and the experimental results show that the SBLSTM model significantly outperforms the GMM.The min t-DCF and EER are reduced by 47.62%and 48.35%,respectively,for the logical access scenario and 31.03%and 39.69%,respectively,for the physical access scenario than the GMM.The performance is further improved by fusing the scores of SBLSTM model and GMM.The min t-DCF and EER are reduced by 71.43%and 70.62%,respectively,compared to GMM for logical access scenario,and by 34.48%and 45.74%,respectively,for physical access scenario.

关 键 词:反欺骗 语音欺骗检测 高斯概率特征 Bi-LSTM 孪生网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象