基于半监督学习和规则相结合的中医古籍命名实体识别研究被引量：11

Named Entity Recognition in Traditional Chinese Medicine Books Combining Semi-supervised Learning and Rule-based Approach

作　　者：包振山[1] 宋秉彦张文博[1] 孙超[2] BAO Zhenshan;SONG Bingyan;ZHANG Wenbo;SUN Chao(College of Computer Science and Technology,Beijing University of Technology,Beijing 100124,China;School of Traditional Chinese Medicine,Capital Medical University,Beijing 100069,China)

机构地区：[1]北京工业大学计算机学院,北京100124 [2]首都医科大学中医药学院,北京100069

出　　处：《中文信息学报》2022年第6期90-100,共11页Journal of Chinese Information Processing

基　　金：北京市教委科技计划一般项目(KM202110025021);北京中医药“薪火传承3+3工程”崔锡章中医文化传承工作室;首都医科大学校科研培育基金(PYZ19167)。

摘　　要：目前针对中医古籍实体识别研究较少,且大多使用有监督学习方法。但古籍数字化程度低、标注语料稀少,且其语言多为文言文,专业术语也不断发展,现有方法无法有效解决以上问题。故而,该文在构建了中医古籍语料库的基础上,通过对中医古籍中实体名的分析研究,提出了一种基于半监督学习和规则相结合的中医古籍实体识别方法。以条件随机场模型为基本框架,在引入词、词性、词典等有监督特征的同时也引入了通过词向量获得的无监督语义特征,对比不同特征组合的识别性能,确定最优的半监督学习模型,并与其他模型进行了对比。之后,结合古籍语言学特点构建规则库对其进行基于规则的后处理。实验结果中最终F值达到83.18%,证明了该方法的有效性。The named entity recognition of traditional Chinese medicine books is a less addressed topic.Considering the difficulty and cost in annotating such professional text in classical Chinese,this paper proposes a method for identifying traditional Chinese medicine entities based on a combination of semi-supervised learning and rules.Under the framework of the conditional random fields model,supervised features such as lexical features and dictionary features are introduced together with the unsupervised semantic features derived from word vectors.The optimal semi-supervised learning model is gained by examining the performance of different feature combinations.Finally,the recognition results of the model are analyzed and a rule based post-processing is established with the linguistic characteristics of ancient books.Experiments results reveals 83.18% F-score,which proves the validity of this method.

关键词：半监督学习条件随机场命名实体识别中医古籍

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于半监督学习和规则相结合的中医古籍命名实体识别研究被引量：11

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于半监督学习和规则相结合的中医古籍命名实体识别研究 被引量：11

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于半监督学习和规则相结合的中医古籍命名实体识别研究被引量：11