检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:包振山[1] 宋秉彦 张文博[1] 孙超[2] BAO Zhenshan;SONG Bingyan;ZHANG Wenbo;SUN Chao(College of Computer Science and Technology,Beijing University of Technology,Beijing 100124,China;School of Traditional Chinese Medicine,Capital Medical University,Beijing 100069,China)
机构地区:[1]北京工业大学计算机学院,北京100124 [2]首都医科大学中医药学院,北京100069
出 处:《中文信息学报》2022年第6期90-100,共11页Journal of Chinese Information Processing
基 金:北京市教委科技计划一般项目(KM202110025021);北京中医药“薪火传承3+3工程”崔锡章中医文化传承工作室;首都医科大学校科研培育基金(PYZ19167)。
摘 要:目前针对中医古籍实体识别研究较少,且大多使用有监督学习方法。但古籍数字化程度低、标注语料稀少,且其语言多为文言文,专业术语也不断发展,现有方法无法有效解决以上问题。故而,该文在构建了中医古籍语料库的基础上,通过对中医古籍中实体名的分析研究,提出了一种基于半监督学习和规则相结合的中医古籍实体识别方法。以条件随机场模型为基本框架,在引入词、词性、词典等有监督特征的同时也引入了通过词向量获得的无监督语义特征,对比不同特征组合的识别性能,确定最优的半监督学习模型,并与其他模型进行了对比。之后,结合古籍语言学特点构建规则库对其进行基于规则的后处理。实验结果中最终F值达到83.18%,证明了该方法的有效性。The named entity recognition of traditional Chinese medicine books is a less addressed topic.Considering the difficulty and cost in annotating such professional text in classical Chinese,this paper proposes a method for identifying traditional Chinese medicine entities based on a combination of semi-supervised learning and rules.Under the framework of the conditional random fields model,supervised features such as lexical features and dictionary features are introduced together with the unsupervised semantic features derived from word vectors.The optimal semi-supervised learning model is gained by examining the performance of different feature combinations.Finally,the recognition results of the model are analyzed and a rule based post-processing is established with the linguistic characteristics of ancient books.Experiments results reveals 83.18% F-score,which proves the validity of this method.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147