检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:申小虎[1] 金恬[2] 李佳蔚 韩春润 SHEN Xiaohu;JIN Tian;LI Jiawei;HAN Chunrun(Department of Forensic Science and Technology,Jiangsu Police Institute,Nanjing 210031,China;Evidence Identification Center,Jiangsu Provincial Public Security Bureau,Nanjing 210031,China)
机构地区:[1]江苏警官学院刑事科学技术系,南京210031 [2]江苏省公安厅物证鉴定中心,南京210031
出 处:《刑事技术》2021年第3期234-240,共7页Forensic Science and Technology
基 金:公安部应用创新计划项目(2020YYCXHNST046);现场物证溯源技术国家工程实验室开放课题(2018NELKFKT10)。
摘 要:目的研究语音识别网络模型在声学信息中的表征能力,并对方言自动分类应用进行最优单模型筛选。方法使用python仿真实现SOM、RNN、LSTM与CNN模型,并选择合适的分类器进行方言分类任务的训练与分类验证实验。结果实验结果显示,多分类评价指标PRF条件下,LSTM模型取得了宏平均和微平均的最优评价得分。同时CNN模型则在低信噪比条件下显示了较好的抗噪鲁棒性。结论LSTM+CNN框架下方言信息表征能力较好且兼具强鲁棒性,可满足方言自动分类任务的二次开发应用。Objective To explore the presentation of acoustic characteristics with network models for dialect identification so as to screen out the optimal singular model for automatic dialect classifier.Methods Four selected typical neural network models for acoustic feature extraction,SOM(self-organizing feature Map),RNN(recurrent neural network),LSTM(long shortterm memory network)and CNN(convolutional neural network),were individually simulated through python.With the dataset containing typical dialects(6036 samples of 105 persons’spoken voices)from 13 cities in Jiangsu province,three aggregates were respectively built up for purpose of training,verification and test at the division ratio of 6:2:2.The test aggregate was then edited into sub-aggregates of 3 and 10 seconds,having each further added of white noise to form the sub-aggregates owning signalto-noise ratio(SNR)of 3 and 10 dB.Thus,4 test aggregates were thereby produced,with each containing 1207 samples.The appropriate classifiers were chosen to evaluate the performance of four above-selected models into their operations of training,verification and test.For the dialect identification,every selected network model was verified of its ability to extract features from the test aggregates owning different SNR and duration.Results With the previously-normalized data and network parameters,the confusion matrices of models were obtained from the output data of 4 neural network models processing into 4 test aggregates,having resulted in the Macro-F1 and Micro-F1 scores that are useful and eligible for evaluation of multi-classification problem.The results showed that LSTM and CNN are significantly better of performance than SOM and RNN.SOM is obviously more sensitive to the SNR of test samples,though having poor identification accuracy with the 3dB test aggregate.RNN has the improved accuracy for dialect identification,yet having the insufficient representation ability to key information of long-term samples.LSTM achieves the optimal evaluation scores of 93.1%(Macro-F1
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.143.209.210