AIGC助力数字人文研究的实践探索:SikuGPT驱动的古诗词生成研究  被引量:22

A Practical Exploration of AIGC-Powered Digital Humanities Research:A SikuGPT Driven Research of Ancient Poetry Generation

在线阅读下载全文

作  者:刘江峰 刘雏菲 齐月 刘浏 李斌[2] 刘畅[1] 王东波[1] Liu Jiangfeng

机构地区:[1]南京农业大学信息管理学院,江苏南京210095 [2]南京师范大学文学院,江苏南京210024

出  处:《情报理论与实践》2023年第5期23-31,共9页Information Studies:Theory & Application

基  金:国家社会科学基金重大项目“中国古代典籍跨语言知识库构建及应用研究”的成果,项目编号:21&ZD331。

摘  要:[目的/意义]诗词创作是数字人文领域自然语言生成研究的重要方向,对古诗词遣词造句的版本争议判断、自动诗词问答等具有一定意义,然而当前尚未出现能够自动生成繁体中文古诗词的预训练模型,已有研究着眼于根据使用者需求创作不同风格的简体古诗词。[方法/过程]文章基于CLM使用繁体《四库全书》无标点语料、繁体中文古诗词语料在gpt2-chinese-cluecorpussmall上进行继续预训练构建SikuGPT2、SikuGPT2-poem模型。采用困惑度、BLEU、专家打分、图灵测试等验证模型性能。[结果/结论]实验显示SikuGPT2-poem模型困惑度较低,生成的诗歌BLUE评分较基准模型低0.053左右,在人工打分中较基准模型平均高1.93分。总体而言,文章提出的模型表现优异且通过图灵测试,提出的古汉语生成式系列模型的预训练语料集尚小。模型在古诗生成方面表现较好,但尚不能满足赋、曲等体裁的需要。[Purpose/significance]Poetry composition is an important direction for natural language generation research in the digital humanities,with implications for version dispute judgment of ancient poetry phrasing and automatic poetry quizzes.Yet no pre-training model capable of automatically generating ancient poems in traditional Chinese has emerged,and existing research has focused on creating different styles of simplified ancient poetry according to user needs.[Method/process]This paper constructs SikuGPT2 and SikuGPT2-poem models based on CLM using the traditional Si Ku Quan Shu unpunctuated corpus and the traditional Chinese ancient poetry corpus on gpt2-chinese-cluecorpussmall for continued pre-training.Perplexity,BLEU,expert scoring,and the Turing test were used to verify the model performance.[Result/conclusion]The experiments show that the SikuGPT2-poem model has a lower perplexity,generated poems with BLUE scores around 0.053 lower than the benchmark model,and scores on average 1.93 points higher than the benchmark model in manual scoring.Overall,the model proposed in this paper performed well and passed the Turing test.The pre-trained corpus set of the series generative model of ancient Chinese proposed in this paper is still small.The model performs well in the generation of ancient poems but cannot yet meet the needs of genres such as fugue and song.

关 键 词:四库全书 SikuGPT 预训练语言模型 诗歌生成 数字人文 

分 类 号:G250.7[文化科学—图书馆学] TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象