方言识别网络模型的声学信息表征研究被引量：2

Presentation of Acoustic Characteristics with Network Models for Dialect Identification

作　　者：申小虎[1] 金恬[2] 李佳蔚韩春润 SHEN Xiaohu;JIN Tian;LI Jiawei;HAN Chunrun(Department of Forensic Science and Technology,Jiangsu Police Institute,Nanjing 210031,China;Evidence Identification Center,Jiangsu Provincial Public Security Bureau,Nanjing 210031,China)

机构地区：[1]江苏警官学院刑事科学技术系,南京210031 [2]江苏省公安厅物证鉴定中心,南京210031

出　　处：《刑事技术》2021年第3期234-240,共7页Forensic Science and Technology

基　　金：公安部应用创新计划项目(2020YYCXHNST046);现场物证溯源技术国家工程实验室开放课题(2018NELKFKT10)。

摘　　要：目的研究语音识别网络模型在声学信息中的表征能力,并对方言自动分类应用进行最优单模型筛选。方法使用python仿真实现SOM、RNN、LSTM与CNN模型,并选择合适的分类器进行方言分类任务的训练与分类验证实验。结果实验结果显示,多分类评价指标PRF条件下,LSTM模型取得了宏平均和微平均的最优评价得分。同时CNN模型则在低信噪比条件下显示了较好的抗噪鲁棒性。结论LSTM+CNN框架下方言信息表征能力较好且兼具强鲁棒性,可满足方言自动分类任务的二次开发应用。Objective To explore the presentation of acoustic characteristics with network models for dialect identification so as to screen out the optimal singular model for automatic dialect classifier.Methods Four selected typical neural network models for acoustic feature extraction,SOM(self-organizing feature Map),RNN(recurrent neural network),LSTM(long shortterm memory network)and CNN(convolutional neural network),were individually simulated through python.With the dataset containing typical dialects(6036 samples of 105 persons’spoken voices)from 13 cities in Jiangsu province,three aggregates were respectively built up for purpose of training,verification and test at the division ratio of 6:2:2.The test aggregate was then edited into sub-aggregates of 3 and 10 seconds,having each further added of white noise to form the sub-aggregates owning signalto-noise ratio(SNR)of 3 and 10 dB.Thus,4 test aggregates were thereby produced,with each containing 1207 samples.The appropriate classifiers were chosen to evaluate the performance of four above-selected models into their operations of training,verification and test.For the dialect identification,every selected network model was verified of its ability to extract features from the test aggregates owning different SNR and duration.Results With the previously-normalized data and network parameters,the confusion matrices of models were obtained from the output data of 4 neural network models processing into 4 test aggregates,having resulted in the Macro-F1 and Micro-F1 scores that are useful and eligible for evaluation of multi-classification problem.The results showed that LSTM and CNN are significantly better of performance than SOM and RNN.SOM is obviously more sensitive to the SNR of test samples,though having poor identification accuracy with the 3dB test aggregate.RNN has the improved accuracy for dialect identification,yet having the insufficient representation ability to key information of long-term samples.LSTM achieves the optimal evaluation scores of 93.1%(Macro-F1

关键词：方言识别声学模型声学信息表征自动分类

分类号：D793.2[政治法律—政治学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

方言识别网络模型的声学信息表征研究被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

方言识别网络模型的声学信息表征研究 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

方言识别网络模型的声学信息表征研究被引量：2