融合乌尔都语词性序列预测的汉乌神经机器翻译  

Chinese-Urdu neural machine translation interacting POS sequence prediction in Urdu language

在线阅读下载全文

作  者:陈欢欢 王剑[1,2] Muhammad Naeem Ul Hassan[1,2] CHEN Huan-huan;WANG Jian;Muhammad Naeem Ul Hassan(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;Key Laboratory of Artificial Intelligence in Yunnan Province,Kunming University of Science and Technology,Kunming 650500,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500 [2]昆明理工大学云南省人工智能重点实验室,云南昆明650500

出  处:《计算机工程与科学》2024年第3期518-524,共7页Computer Engineering & Science

基  金:国家自然科学基金(62166022,62266028)。

摘  要:面向南亚和东南亚的小语种机器翻译,目前已有不少研究团队开展了深入研究,但作为巴基斯坦官方语言的乌尔都语,由于稀缺的数据资源和与汉语之间的巨大差距,有针对性的汉乌机器翻译方法研究非常稀少。针对这种情况,提出了基于Transformer的融合乌尔都语词性序列的汉乌神经机器翻译模型。首先利用Transformer对目标语言乌尔都语的词性序列进行预测,然后将翻译模型的预测结果和词性序列模型的预测结果相结合进行联合预测,从而实现语言知识到翻译模型的融入。在现有小规模汉乌数据集上的实验表明,所提方法在数据集上的BLEU值相较于基准模型提升了0.13,取得了较为明显的效果。At present,many research teams have conducted in-depth research on minority language machine translation for South and Southeast Asia.However,as the official language of Pakistan,Urdu has limited data resources and a significant gap from Chinese,resulting in a lack of targeted research on Chinese-Urdu machine translation methods.To address this issue,this paper proposes a Chinese-Urdu neural machine translation model based on Transformer and incorporating Urdu part-of-speech sequence prediction.Firstly,Transformer is used to predict the part-of-speech sequence of the target language Urdu.Then,the translation model’s prediction results are combined with the part-of-speech sequence prediction model's results to jointly predict the final translation,thereby integrating language knowledge into the translation model.Experimental results on a small-scale Chinese-Urdu dataset show that the proposed method has a BLEU score of 0.13 higher than the baseline model on the dataset,achieving significant improvement.

关 键 词:TRANSFORMER 神经机器翻译 乌尔都语 词性序列 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象