检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冯慧敏 郭帅帅 刘铭 FENG Huimin;GUO Shuaishuai;LIU Ming(Department of Basic Courses,Shandong Agricultural Engineering University,Jinan 250100,China;Institute for Advanced Study in History of Science,Northwest University,Xi'an 710127,China)
机构地区:[1]山东农业工程学院基础课教学部,济南250100 [2]西北大学科学史高等研究院,西安710127
出 处:《科技导报》2024年第23期135-144,共10页Science & Technology Review
基 金:陕西省重点研发计划科研项目(2019ZDLGY17-03);西北大学研究生创新项目(CX2023045);山东农业工程学院科研启动经费项目(2024GCCZR-17)。
摘 要:里耶秦简的数量是之前出土秦简的10倍,填补了秦朝历史记载中的诸多空白。将《里耶秦简》作为实验语料,探索基于CRF(条件随机场)模型的里耶秦简自动断句与分词方法。结合简文的实际特点,通过设置不同的特征模板,面向不同的任务验证模型序列标注的泛化能力;通过设置断句、分词一体化的对比实验,以选取性能更优的处理方案;同时设计了深度学习方法与预训练模型的对比试验。实验结果表明,CRF模型一体化的标注方案在各任务中的整体性能均有所提升,其中自动断句、分词的F1值分别达到75.79%与94.44%,且速度快用时少,更适用于里耶秦简。Information processing of ancient Chinese seldom uses unearthed documents as corpus to carry out relevant research.The number of Liye Qin bamboo manuscripts reached ten times that of all the Qin slips unearthed before,which can fill many gaps in the historical records of the Qin Dynasty.In this paper,we used them as experimental corpus and explored the automatic sentence segmentation and word segmentation of unearthed documents based on the CRF model.We combined the actual characteristics of the corpus and set up different feature templates to verify the generalization ability of model sequence labeling on different tasks.We set up a joint approach to sentence segmentation and word segmentation as comparative experiment to select a better performance processing plan.At the same time,a comparative experiment was designed between deep learning methods and pretrained models.The results proved that the overall performance of the joint approach in each task was improved and that the F1-score of automatic sentence segmentation and word segmentation reached 75.79%and 94.44%,respectively.Since it's faster and takes less time,this approach is more suitable for the Liye Qin bamboo slips.The research results can serve the proofreading work of the last three volumes of Liye Qin bamboo slips and the in-depth processing and construction of the corpus.
分 类 号:K877.5[历史地理—考古学及博物馆学] TP391.1[历史地理—历史学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3