基于深度学习的长语音口音识别研究  

Research on Long Speech Accent Recognition Based on Deep Learning

在线阅读下载全文

作  者:朱丹浩[1] 王震[2] 黄肖宇 马壮 徐杰 Zhu Danhao;Wang Zhen;Huang Xiaoyu;Ma Zhuang;Xu Jie(Department of Criminal Science and Technology,Jiangsu Police Institute,Nanjing 210031,China;Department of Cadre Training,Jiangsu Police Institute,Nanjing 210031,China;Department of Computer Information and Network Security,Jiangsu Police Institute,Nanjing 210031,China;Jiangsu Province Zhangjiagang Public Security Bureau,Suzhou 215600,China)

机构地区:[1]江苏警官学院刑事科学技术系,江苏南京210031 [2]江苏警官学院干训部,江苏南京210031 [3]江苏警官学院计算机信息与网络安全系,江苏南京210031 [4]江苏省苏州市张家港市公安局,江苏苏州215600

出  处:《南京师大学报(自然科学版)》2022年第4期110-118,共9页Journal of Nanjing Normal University(Natural Science Edition)

基  金:国家自然科学基金项目(71974094);江苏省社科基金项目(19TQD002);江苏省教育厅自科项目(21KJB520004);江苏高校优势学科工程资助项目(PAPD).

摘  要:普通话口音识别是物证鉴定的重要技术之一.目前普通话口音识别技术主要基于传统机器学习方法建立,也未针对长语音做专门设计,识别精度不高.针对以上问题,本文提出了基于深度学习的长语音口音识别方法.该方法首先将长语音切分为句子级别的多个短语音,然后使用经过预训练的X-vectors模型提取特征,再基于不同方法对句子特征进行融合,最后采用Amsoftmax最大化口音类别间隔并进行分类.在真实的物证口音识别数据集上的实验结果显示,本文方法的识别精确率为94.1%,比非深度学习的基准方法和基于X-vectors的基准方法分别提升了21.6%和2.1%,验证了本文方法的有效性和针对长语音的口音识别能力.Mandarin accent recognition is one of the important technical tools for identifying judicial evidence.At present,Mandarin accent recognition technology is mainly based on traditional machine learning methods,and is not specially designed for long speech,so the recognition accuracy is not high.To address the above problems,this paper proposes a long speech accent recognition method based on deep learning.The method firstly cuts the long speech into multiple short speech at sentence level,then extracts features using pre-trained X-vectors model,then fuses the sentence features based on different methods,and finally uses Amsoftmax to maximize the accent category interval and perform classification.Experimental results on a real public security accent recognition dataset show that the recognition accuracy of this paper is 94.1%,which is 21.6%and 2.1%better than the non-deep learning benchmark method and the X-vectors-based benchmark method,respectively,verifying the effectiveness of this paper and the accent recognition ability for long speech.

关 键 词:深度学习 口音识别 长语音 普通话 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TN912.34[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象