面向古籍数字人文的《资治通鉴》自动摘要研究——以SikuBERT预训练模型为例  被引量:13

Automatic Summarization of ZiZhi TongJian from the Perspective of Digital Humanities Based on Ancient Chinese Books:A Case of SikuBERT Pre-training Model

在线阅读下载全文

作  者:徐润华[1] 王东波[2] 刘欢 梁媛 陈康 XU Runhua;WANG Dongbo;LIU Huan;LIANG Yuan;CHEN Kang

机构地区:[1]金陵科技学院 [2]南京农业大学信息管理学院

出  处:《图书馆论坛》2022年第12期129-137,共9页Library Tribune

基  金:国家社科基金重大项目“中国古代典籍跨语言知识库构建及应用研究”(项目编号:21&ZD331);江苏高校哲学社会科学研究项目“基于CSSCI的组块级汉英平行语料库构建及知识挖掘研究”(项目编号:2018SJA0473)研究成果。

摘  要:能降低信息获取成本,对篇幅长而句子短、文字理解门槛高的古籍文献而言尤其必要,但针对古文的自动摘要研究很少。文章面向《资治通鉴》语料,基于SikuBERT预训练模型进行自动摘要实验,并对比其与传统抽取式自动摘要算法和百度智能云摘要分析算法在《资治通鉴》语料上的表现。实验结果表明:基于SikuBERT预训练模型生成的摘要结果在稳定性、覆盖度等方面较好;通过专家人工打分方式,基于SikuBERT预训练模型生成的摘要结果平均得分最高。实验验证了使用数字人文技术对古文进行自动摘要任务的可行性和利用SikuBERT预训练模型对古文进行信息处理的适用性。Automatic summarization can lower the cost of information acquisition,especially the texts from ancient books marked by their great length,short sentences and difficulties of understanding.However,there are few studies on automatic summarization of ancient texts.Based on a SikuBERT pre-training model,this paper conducts an experiment on the automatic summarization using ZiZhi TongJian as the research corpus,and performs a comparative analysis on the performances of automatic summarization among the traditional extraction algorithm,Baidu’s cloud-based intelligent algorithm and the SikuBERT pre-training model.The results indicate that,with regard to the automatic summaries,the SikuBERT pre-training model is superior to the other two methods in terms of stability and coverage,getting the highest average score in expert rating.The experiment verifies the feasibility of automatic summarization of ancient Chinese books using digital humanistic technology and the applicability of information processing for ancient Chinese proses using SikuBERT pre-training model.

关 键 词:数字人文 SikuBERT 预训练模型 自动摘要 

分 类 号:G255.1[文化科学—图书馆学] G250.7

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象