中医古籍智能机器翻译模型构建研究  

Study on the Construction of Intelligent Machine Translation Model for TCM Ancient Books

在线阅读下载全文

作  者:宋熹玥 周净 刘伟 SONG Xiyue;ZHOU Jing;LIU Wei(School of Informatics,Hunan University of Chinese Medicine,Changsha 410208,China)

机构地区:[1]湖南中医药大学信息科学与工程学院,湖南长沙410208

出  处:《中国中医药图书情报杂志》2024年第6期130-135,共6页Chinese Journal of Library and Information Science for Traditional Chinese Medicine

基  金:湖南省自然科学基金(2022JJ30438);长沙市自然科学基金(kq2202260);湖南省中医药科研计划(B2023039)。

摘  要:目的构建科学规范的中医古籍智能机器翻译模型,将古籍精准地翻译为中文或英文,为临床医学学习及中医传播提供参考。方法首先,针对中医古籍机器翻译进行研究,初期实验构建句子级别的平行语料数据集,包括969754组平行句子对;其次建立建注意力机制Seq2Seq模型(Seq2Seq+Attention),使用Seq2Seq预训练模型(Pre-Training+Seq2Seq)对80万首古诗词进行训练;最后,在构建的数据集上进行实验,利用BLEU1、BLEU2和F1作为评价指标来验证模型有效性及进一步优化的可行性。结果构建的Pre-Training+Seq2Seq模型F1值达到65.72%。结论Pre-Training+Seq2Seq模型效果好,为中医古籍智能机器翻译提供思路。Objective To construct a scientific and standardized intelligent machine translation model for TCM ancient books;To accurately translate ancient books into Chinese or even English;To provide reference for clinical medical learning and TCM dissemination.Methods Firstly,machine translation of TCM ancient books was studied,and the initial experiments were conducted to construct a parallel corpus dataset at the sentence level,including 969,754 parallel sentence pairs;secondly,the Seq2Seq model of the attention mechanism(Seq2Seq+Attention)was created,and the Seq2Seq pretraining model(Pre-Training+Seq2Seq)was used to train 800,000 ancient poems;lastly,the Seq2Seq model was constructed to train 800000 ancient poems;finally,experiments were conducted on the constructed dataset,and BLEU1,BLEU2 and F1 were used as evaluation indexes to verify the effectiveness of the model and the feasibility of further optimization.Results The F1 value of the experimentally constructed Pre-Training+Seq2Seq model reached 65.72%.Conclusion The Pre-Training+Seq2Seq model is effective and provides ideas for intelligent machine translation of TCM ancient books.

关 键 词:中医古籍 文言文 语料库 文本对齐 机器翻译 

分 类 号:R2-05[医药卫生—中医学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象