基于目标语言预训练和联合解码的低资源语言端到端语音翻译  

End-to-end Speech Translation for Low-Resource Languages Based on Target Language Pre-Training and Joint Decoding

在线阅读下载全文

作  者:李宁 朱丽平[1,2,3] 赵小兵[1,2,3] 仁曾卓玛 王燕敏[1,3] LI Ning;ZHU Liping;ZHAO Xiaobing;RENZENG Zhuoma;WANG Yanmin(School of Information Engineering,Minzu University of China,Beijing 100081,China;National Language Resource Monitoring&Research Center of Minority Languages,Minzu University of China,Beijing 100081,China;Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE,Minzu University of China,Beijing 100081,China)

机构地区:[1]中央民族大学信息工程学院,北京100081 [2]中央民族大学国家语言资源监测与研究少数民族语言中心,北京100081 [3]中央民族大学民族语言智能分析与安全治理教育部重点实验室,北京100081

出  处:《中文信息学报》2023年第12期36-43,共8页Journal of Chinese Information Processing

基  金:国家社会科学基金(17BGL199);中央民族大学研究生精品示范课程(GRSCP202316)。

摘  要:自动语音翻译(AST)是将源语言语音转换为目标语言文字的技术。目前,端到端的语音翻译成为AST的研究主流,但面临数据稀缺问题。该文首先利用机器翻译和人工检验构建了20h的维吾尔语-汉语AST语音翻译数据集。其次,为提高端到端语音翻译模型的性能,使用语料相对丰富的目标语言语音识别数据集预训练模型,不仅解决了数据稀缺造成的模型无法收敛问题,而且能让模型学到目标语言的语言学知识;再次,在预训练解码器前添加映射模块,使其学到源语言到目标语言知识的映射关系,由此构建了端到端语音翻译模型。最后,使用CTC与Attention联合解码,强制语音标签对齐,提高翻译效果。实验结果表明,在维汉语音翻译数据集上达到了61.45 BLEU值。Automatic Speech Translation(AST)is a technology that converts speech in a source language into text in a target language.At present,end-to-end speech translation has become the mainstream of research.This paper first constructs a 20-hour Uyghur-Chinese AST speech translation dataset using machine translation and human inspection.Then,the target language speech recognition data set with relatively rich corpus is used to pre-train the model,which avoids the data scarcity and enables the model to learn the target language.An end-to-end speech translation model is established by adding a mapping module before the decoder,so that it can learn the mapping relationship between the source language and the target language.In addition,the CTC and attention joint decoding is adopted to enforce the alignment of voice tags and improve the translation quality.Experimental results show that our method achieves 61.45 BLEU score on the Uyghur-Chinese speech translation dataset.

关 键 词:语音翻译 端到端 数据集构建 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象