数字人文视域下简帛文献的分词研究--以《里耶秦简牍》为例  

Research on Word Segmentation of Liye Qin Bamboo Slips from the Perspective of Digital Humanities

在线阅读下载全文

作  者:刘铭 冯慧敏 陈镱文[1,2] LIU Ming;FENG Huimin;CHEN Yiwen

机构地区:[1]西北大学科学史高等研究院,西安710127 [2]陕西省文化遗产数字人文重点实验室,西安710127

出  处:《语言文字应用》2024年第3期132-144,共13页Applied Linguistics

基  金:陕西省重点研发计划项目“数字化文化资源平台的智能分析与利用研究”(2019ZDLGY17-03);陕西省秦创原队伍建设项目“数字人文视域下文化遗产人工智能核心技术研发与应用‘科学家+工程师’队伍”(2022KXJ-143)的资助。

摘  要:简帛文献是一类不同于传世典籍的传统文化载体。本文以两卷里耶秦简为例,结合数字人文的文本数据计算及分析方法,对其进行自动分词研究。基于经过人工标注的里耶秦简文本构建里耶秦简语料库,分别使用3类分词方法进行实验,对比并讨论其结果。实验显示,Bi-LSTM-CRF模型的分词效果最佳,准确率达到94.54%,召回率94.82%,F值为94.68%。实验结果不仅验证了深度学习的分词方法在里耶秦简等简帛文献中的有效性和泛化能力,还表明其可应用于简帛词汇研究、语料库深加工以及文本分析等多元任务中。Bamboo slips and silk manuscripts are kinds of traditional Chinese culture and thought,which is different from ancient Chinese classics.Taking the two volumes of Liye Qin Slips published now as an example,this paper conducts research on automatic word segmentation.The Liye-Text-Corpus is constructed based on the artificially annotated Liye Qin Slips text,and the word segmentation experiments are carried out on three word-segmentation methods,and the comparison and discussion of its influence are carried out.The Bi-LSTM-CRF model works best,with an accuracy rate of 94.54%,a recall rate of 94.82%,and a F value of 94.68%.This result confirms the effectiveness and generalization ability of Deep Learning Word-Segmentation-Method for word segmentation on Bamboo slips and silk documents such as Liye Qin bamboo slips,and can serve downstream tasks such as vocabulary research,corpus deep processing,and text analysis on Bamboo Slips.

关 键 词:数字人文 简帛文献 里耶秦简 自动分词 深度学习 

分 类 号:G255.1[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象