基于特征提示的跨语种语音识别模型  

Cross-lingual Speech Recognition Model Based on Feature Prompting

在线阅读下载全文

作  者:王嘉文 高定国 索朗曲珍[1,2] 尼琼 WANG Jia-wen;GAO Ding-guo;SUOLANG Qu-zhen;NI Qiong(School of Information Science and Technology,Tibet University,Lhasa 850000,China;Tibetan Information Technology Innovative Talent Cultivation Demonstration Base,Tibet University,Lhasa 850000,China)

机构地区:[1]西藏大学信息科学技术学院,拉萨850000 [2]西藏大学藏文信息技术创新人才培养示范基地,拉萨850000

出  处:《科学技术与工程》2024年第24期10348-10355,共8页Science Technology and Engineering

基  金:国家自然科学基金(62166038);四川省科技计划基金(2023YFQ0044);西藏大学高水平人才培养计划项目(2021-GSP-S126)。

摘  要:跨语种语音识别是一种利用多种源语言的数据来训练一个能够识别目标语言的语音识别系统,它可以促进不同语言和文化之间的交流和理解。为解决跨语种语音识别存在着如何利用多语种数据来提高低资源语言的识别性能,源语言和目标语言之间的领域偏移或干扰,不同语言之间的任务权重和数据分布等问题,通过特征提示的方法研究跨语种语音识别模型;为简化传统需要专业人员对音素进行统一标注的过程,通过对原数据标识对应语种的方法研究跨语种语音数据标注方式,在2个公开数据集上进行实验。结果表明:所提模型相比于目前主流的语音识别模型Conformer模型平均错误率降低46.44%,相比于基线模型平均错误率降低2.1%,达到较高的识别准确率。研究成果为跨语种语音识别领域提供了新的思路和方法。Cross-lingual speech recognition leverages data from a variety of source languages to train systems capable of identifying speech in a target language,thus promoting intercultural communication and understanding.To address the issues of how to utilize multilingual data to improve the recognition performance of low resource languages in cross-lingual speech recognition,domain shift or interference between source and target languages,task weights and data distribution between different languages,a cross lingual speech recognition model was studied through feature prompts.To simplify the traditional process of requiring professionals to label phonemes uniformly,a cross-lingual speech data annotation method was studied by identifying the corresponding language in the original data,and experiments were conducted on two public datasets.The results show that the proposed model achieves a substantial reduction in the average error rate 46.44%lower than the Conformer model,a mainstream speech recognition model,and 2.1%lower than the baseline model,thereby attaining higher accuracy in recognition.The research results provide novel perspectives and methodologies for the domain of cross-lingual speech recognition.

关 键 词:特征提示 跨语种 语音识别 CONFORMER Contextnet 

分 类 号:TN912.3[电子电信—通信与信息系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象