检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王华朋[1] WANG Hua-peng(Video and Audio Material Examination Department,Criminal Investigation Police University of China,Shenyang 110854,China)
机构地区:[1]中国刑事警察学院声像资料检验技术系,辽宁沈阳110854
出 处:《计算机工程与设计》2020年第6期1768-1772,共5页Computer Engineering and Design
基 金:2017国家重点研发计划基金项目(2017YFC0821000);2016国家社会科学基金重点基金项目(16AYY015);辽宁省重点研发计划基金项目(2017231006、2017231004);公安部公安理论及软科学基金项目(2017LLYJXJXY040);重庆市高校刑事科学技术重点实验室(西南政法大学)开放基金项目(XKZDSYS2019-Z1);上海市现场物证重点实验室开放课题基金项目(2018XCWZK09)。
摘 要:为进一步提高说话人识别的准确率,提出一种基于深度双向长短时记忆(long short-term memory,LSTM)网络的说话人识别方法,实现文本无关端到端的说话人身份识别。双向利用语音的序列数据,通过记忆单元,增强上下层之间的联系,提高对语音序列数据的分类能力。实验结果表明,在实验室环境下,对5 s时长的短语音,正确识别率达到97.92%,对噪声干扰具有良好的鲁棒性。该方法能学习语音序列信号特征,应用序列变化信息,可有效进行说话人识别。To improve the accuracy of speaker recognition further,a speaker recognition method based on deep bidirectional LSTM network was proposed to realize end to end text-independent speaker recognition,which learnt long-term dependencies between time steps of voice sequence data in both forward and backward directions and enhanced relation between upper and lower layers through memory unit to improve the discriminant performance for voice data.Experimental results indicate that,the proposed network has 97.92%correct recognition rate for audio files with 5 s duration recorded in laboratory environment,and has good robustness against noise interference.In conclusion,the proposed method can learn the sequence features of speech and apply the changing information between sequences to effectively discriminate speakers by their voices.
关 键 词:长短时记忆 端到端 说话人识别 深度学习 循环神经网络
分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15