基于深度双向LSTM网络的说话人识别  被引量:6

Speaker recognition based on deep bidirectional LSTM network

在线阅读下载全文

作  者:王华朋[1] WANG Hua-peng(Video and Audio Material Examination Department,Criminal Investigation Police University of China,Shenyang 110854,China)

机构地区:[1]中国刑事警察学院声像资料检验技术系,辽宁沈阳110854

出  处:《计算机工程与设计》2020年第6期1768-1772,共5页Computer Engineering and Design

基  金:2017国家重点研发计划基金项目(2017YFC0821000);2016国家社会科学基金重点基金项目(16AYY015);辽宁省重点研发计划基金项目(2017231006、2017231004);公安部公安理论及软科学基金项目(2017LLYJXJXY040);重庆市高校刑事科学技术重点实验室(西南政法大学)开放基金项目(XKZDSYS2019-Z1);上海市现场物证重点实验室开放课题基金项目(2018XCWZK09)。

摘  要:为进一步提高说话人识别的准确率,提出一种基于深度双向长短时记忆(long short-term memory,LSTM)网络的说话人识别方法,实现文本无关端到端的说话人身份识别。双向利用语音的序列数据,通过记忆单元,增强上下层之间的联系,提高对语音序列数据的分类能力。实验结果表明,在实验室环境下,对5 s时长的短语音,正确识别率达到97.92%,对噪声干扰具有良好的鲁棒性。该方法能学习语音序列信号特征,应用序列变化信息,可有效进行说话人识别。To improve the accuracy of speaker recognition further,a speaker recognition method based on deep bidirectional LSTM network was proposed to realize end to end text-independent speaker recognition,which learnt long-term dependencies between time steps of voice sequence data in both forward and backward directions and enhanced relation between upper and lower layers through memory unit to improve the discriminant performance for voice data.Experimental results indicate that,the proposed network has 97.92%correct recognition rate for audio files with 5 s duration recorded in laboratory environment,and has good robustness against noise interference.In conclusion,the proposed method can learn the sequence features of speech and apply the changing information between sequences to effectively discriminate speakers by their voices.

关 键 词:长短时记忆 端到端 说话人识别 深度学习 循环神经网络 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象