基于预训练语言模型的汉语古现翻译模型构建  

Construction of Chinese Classical-Modern Translation Model Based on Pre-trained Language Model

在线阅读下载全文

作  者:吴梦成 刘畅[1,2,3] 孟凯 王东波 Wu Mengcheng;Liu Chang;Meng Kai;Wang Dongbo(College of Information Management of Nanjing Agricultural University,Nanjing,210095;Research Center for Humanities and Social computing of Nanjing Agricultural University,Nanjing,210095;Research Center for Correlation of Domain Knowledge of Nanjing Agricultural University,Nanjing,210095;School of Marxism of Nanjing Agricultural University,Nanjing,210095)

机构地区:[1]南京农业大学信息管理学院,南京210095 [2]人文与社会计算江苏省高校哲学社会科学重点研究基地,南京210095 [3]南京农业大学领域知识关联研究中心,南京210095 [4]南京农业大学马克思主义学院,南京210095

出  处:《信息资源管理学报》2024年第6期143-155,共13页Journal of Information Resources Management

基  金:国家社科基金重大项目“中国古代典籍跨语言知识库构建及应用研究”(21&ZD331)的研究成果。

摘  要:本研究旨在构建并验证一种基于预训练语言模型的汉语古现翻译模型,为我国古汉语研究及文化遗产传承与传播提供强有力的技术支撑。研究选取了总计30万组精加工的《二十四史》平行语料作为实验数据集,并据此开发了一种新的翻译模型——Siku-Trans,该模型创新性地结合了专门为古汉语翻译设计的Siku-RoBERTa(作为编码器)和Siku-GPT(作为解码器),构建了一个高效的encoder-decoder架构;为全面评估Siku-Trans模型的性能,研究引入OpenNMT、SikuGPT、SikuBERT_UNILM三种模型作为对照组,通过对比分析各模型在古汉语翻译任务上的表现发现,Siku-Trans在翻译准确性及流畅度方面均展现出显著优势。这一成果不仅凸显了将Siku-RoBERTa与Siku-GPT结合作为训练策略的有效性,也为古汉语翻译领域的深入研究与实际应用提供了重要参考和启示。This study aims to construct and validate a Chinese ancient-modern translation model based on pre-trained language models,providing strong technical support for the research of ancient Chinese and the inheritance and dissemination of cultural heritage.The study selected a total of 300,000 pairs of meticulously processed parallel corpora from the"Twenty-Four Histories"as the experimental dataset and developed a new translation model—Siku-Trans.This model innovatively combines Siku-RoBERTa(as the encoder)and Siku-GPT(as the decoder),designed specifically for translating ancient Chinese,to build an efficient encoder-decoder architecture.To comprehensively evaluate the performance of the Siku-Trans model,the study introduced three models as control groups:OpenNMT,SikuGPT,and SikuBERT_UNILM.Through comparative analysis of the performance of each model in ancient Chinese translation tasks,we found that Siku-Trans exhibits significant advantages in terms of translation accuracy and fluency.These results not only highlight the effectiveness of combining Siku-RoBERTa with Siku-GPT as a training strategy but also provide important references and insights for in-depth research and practical applications in the field of ancient Chinese translation.

关 键 词:语言模型 机器翻译 古汉语翻译 Siku-RoBERTa Siku-GPT Siku-Trans 

分 类 号:I206.2[文学—中国文学] G202[文化科学—传播学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象