检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘铭 冯慧敏 陈镱文[1,2] LIU Ming;FENG Huimin;CHEN Yiwen
机构地区:[1]西北大学科学史高等研究院,西安710127 [2]陕西省文化遗产数字人文重点实验室,西安710127
出 处:《语言文字应用》2024年第3期132-144,共13页Applied Linguistics
基 金:陕西省重点研发计划项目“数字化文化资源平台的智能分析与利用研究”(2019ZDLGY17-03);陕西省秦创原队伍建设项目“数字人文视域下文化遗产人工智能核心技术研发与应用‘科学家+工程师’队伍”(2022KXJ-143)的资助。
摘 要:简帛文献是一类不同于传世典籍的传统文化载体。本文以两卷里耶秦简为例,结合数字人文的文本数据计算及分析方法,对其进行自动分词研究。基于经过人工标注的里耶秦简文本构建里耶秦简语料库,分别使用3类分词方法进行实验,对比并讨论其结果。实验显示,Bi-LSTM-CRF模型的分词效果最佳,准确率达到94.54%,召回率94.82%,F值为94.68%。实验结果不仅验证了深度学习的分词方法在里耶秦简等简帛文献中的有效性和泛化能力,还表明其可应用于简帛词汇研究、语料库深加工以及文本分析等多元任务中。Bamboo slips and silk manuscripts are kinds of traditional Chinese culture and thought,which is different from ancient Chinese classics.Taking the two volumes of Liye Qin Slips published now as an example,this paper conducts research on automatic word segmentation.The Liye-Text-Corpus is constructed based on the artificially annotated Liye Qin Slips text,and the word segmentation experiments are carried out on three word-segmentation methods,and the comparison and discussion of its influence are carried out.The Bi-LSTM-CRF model works best,with an accuracy rate of 94.54%,a recall rate of 94.82%,and a F value of 94.68%.This result confirms the effectiveness and generalization ability of Deep Learning Word-Segmentation-Method for word segmentation on Bamboo slips and silk documents such as Liye Qin bamboo slips,and can serve downstream tasks such as vocabulary research,corpus deep processing,and text analysis on Bamboo Slips.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.14.7.99