资源匮乏多语言的语种辨识技术研究  

Research on Language Identification Technology for Multi-Languages with Scarce Resources

在线阅读下载全文

作  者:毛雪丽 米吉提·阿不里米提[1] 艾斯卡尔·艾木都拉[1] MAO Xue-li;Mijit Ablimit;Askar Hamdulla(College of Information Science and Engineering,Xinjiang University,Urumqi Xinjiang 830046,China)

机构地区:[1]新疆大学信息科学与工程学院,新疆乌鲁木齐830046

出  处:《计算机仿真》2022年第12期336-341,共6页Computer Simulation

基  金:国家自然科学基金项目(.61662078);国家重点研发计划(2017YFC0820602)。

摘  要:针对现有的语种识别方法对资源丰富、同语系语言的研究较为密集,而对资源匮乏、跨语系语言的研究较少等问题,通过对MFCC、FBank、语谱图等多个特征以及CNN、GRU等多个模型的研究对比,提出了一种基于语谱图特征的CNN-BiGRU的语种识别模型。模型提取语音数据的语谱图,采用卷积网络获取语谱图的视觉特征;通过双向门控循环网络获取时序信息特征;使用全连接网络输出语言种类,实现了资源匮乏、同语系语言以及跨语系多语言的语种识别。在东方语种数据集上进行实验,获得了良好的结果并验证了该方法的有效性。In view of the existing language recognition methods, which are rich in resources and intensive in the study of languages of the same language family, but lack of resources and few in the study of cross language family languages, this paper proposes a language identification model of CNN-BiGRU network based on spectrogram features by comparing MFCC,FBank, spectrogram and other features as well as CNN,GRU and other models. The model extracted the spectrogram of speech data. CNN was used to obtain the visual features of the spectrogram. Then, Bidirectional gated recurrent neural network was used to obtain the temporal information features. Finally the fully-connected network was used to output language classes and realize resource-poor, home-language families, and cross-language families language identification. This paper conducted experiments on the Oriental language data set and obtained good results to verify the effectiveness of the method.

关 键 词:语种识别 语谱图 资源匮乏 多语言 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象