数字人文视域下多粒度特征融合的古文命名实体识别

Multi-Granularity Feature Fusion for Named Entity Recognition of Classical Chinese Texts from the Perspective of Digital Humanities

作　　者：孟佳娜许英傲赵丹丹李丰毅赵迪 Meng Jiana;Xu Yingao;Zhao Dandan;Li Fengyi;Zhao Di(School of Computer Science and Engineering,Danlian Minzu University,Dalian 116600)

机构地区：[1]大连民族大学计算机科学与工程学院,大连116600

出　　处：《知识管理论坛》2024年第6期533-546,共14页Knowledge Management Forum

基　　金：教育部人文社会科学研究规划基金项目“基于知识图谱的中华文化互联网智慧传播研究”(项目编号:23YJA860010);中央高校基本科研业务费资助基金项目“基于大模型和知识驱动的情感分析研究”(项目编号:140250)研究成果之一

摘　　要：[目的/意义]利用命名实体识别技术深入挖掘古籍文献,推动中文古籍数字化进程,对于推动历史学习、增强文化自信以及弘扬中国传统文化具有重要意义。[方法/过程]提出多粒度特征融合的古文命名实体识别方法,以《左传》为研究语料,构建人名、地名、时间等命名实体识别任务。首先,将古文字信息、词性信息及字形特征融合,提高输入特征表示能力;然后,在加入预测实体头尾辅助任务学习古句边界信息的同时利用Transfer交互器启发式学习古文实体构词规律,并用BiLSTM和IDCNN联合抽取上下文信息;最后,将学习到的多种古文特征加权融合,输入CRF中进行实体预测。[结果/结论]实验结果表明,多粒度特征融合的古文命名实体识别方法,相比主流的BERT-BiLSTM-CRF模型,精确率、召回率和F1值分别提升5.09%、13.45%和9.87%。多粒度特征融合的古文命名实体识别方法能够精准地实现对古籍文本的命名实体识别。[Purpose/Significance]Leveraging Named Entity Recognition(NER)techniques for the thorough exploration of ancient literary documents not only drives forward the digitization of ancient Chinese texts,including the vital process of Ancient text digitization,which is crucial for historical studies,bolstering cultural confidence,promoting traditional Chinese culture,and advancing Named Entity Recognition(NER)as a foundational task in NLP.[Method/Process]A method for named entity recognition in classical Chinese texts with multi-granularity feature fusion was proposed,Leveraging“Zuo Zhuan”as the research corpus and formulating named entity recognition tasks for personal names,geographical names,temporal entities,etc.Initially,ancient character information,part-of-speech(POS)information,and glyph features were integrated to enhance input feature representation.Subsequently,auxiliary tasks for predicting entity boundaries were introduced,alongside the utilization of a Transfer Interactor heuristic to learn classical Chinese entity formation rules.This was complemented by joint contextual information extraction using BiLSTM and IDCNN(Iterated Dilated Convolutional Neural Network).Finally,learned features were weighted and merged into a CRF(Conditional Random Field)for entity prediction.[Result/Conclusion]Experimental results demonstrate that the proposed method of multi-granularity feature fusion for named entity recognition in classical Chinese texts enhances precision,recall,and F1 score by 5.09%,13.45%,and 9.87%,respectively,compared to the mainstream BERT-BiLSTM-CRF method.Multi-granularity feature fusion for named entity recognition in classical Chinese texts is crucial for accurately identifying named entities in ancient texts.

关键词：数字人文古文实体识别多粒度特征融合

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

数字人文视域下多粒度特征融合的古文命名实体识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

数字人文视域下多粒度特征融合的古文命名实体识别

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索