基于RoBERTa的中医药专利命名实体识别  

Named Entity Recognition of TCM Patent Based on RoBERTa-WWM

作  者:邓娜[1] 何昕洋 熊才权[1] 宗泽华 DENG Na;HE Xinyang;XIONG Caiquan;ZONG Zehua(School of Computer Science,Hubei Univ.of Tech.,430068,China)

机构地区:[1]湖北工业大学计算机学院,湖北武汉430068

出  处:《湖北工业大学学报》2025年第1期55-60,75,共7页Journal of Hubei University of Technology

摘  要:中医药发明专利的成分及功能实体具有种类复杂、歧义繁多等特点。针对传统命名实体识别方法无法充分获取其中的语义特征表示,上下文信息及一词多义等问题,提出一种中医药发明专利命名实体识别模型,该模型将RoBERTa-WWM预训练模型、双向长短期记忆(BiLSTM)网络、条件随机场(CRF)三个模块串联结合,将专利摘要依次通过RoBERTa-WWM进行语义提取生成含有先验知识的语义词嵌入;BiLSTM网络增强词嵌入中的上下文特征信息;CRF解码序列,输出概率最大结果。实验结果表明,该模型在真实中医药发明专利文本语料库上,F1值(F-Measure)相较其他主流方法在成分与功能的识别上分别提升了5.80%和6.63%,能有效提升中医药发明专利摘要中药物成分及功能识别的准确率。The components and functional entities of invention patents of traditional Chinese medicine have the characteristics of complex types and various ambiguities.The traditional named entity recognition methods cannot fully obtain the semantic feature representation,context information and polysemy of a word.A model for named entity recognition of Chinese medicine invention patents is proposed.The model combines three modules in series:RoBERTa WWM pre training model,tw-way short and long-term memory(BILSTM)network and conditional random field(CRF).The patent abstracts are sequentially extracted through RoBERTa-WWM to generate semantic words with prior knowledge;BILSTM network enhances the context feature information in word embedding;The CRF decoding sequence outputs the maximum probability result.The experimental results show that on the corpus of real Chinese medicine invention patents,the F1 value of the model has increased by 5.80%and 6.63%respectively compared with other mainstream methods in the identification of components and functions,and can effectively improve the accuracy of the identification of drug components and functions in the abstract of Chinese medicine invention patents.

关 键 词:中医药发明专利 命名实体识别 RoBERTa-WWM BiLSTM 

分 类 号:TP312[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象