检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:罗媛媛 杨春明[2,4] 李波 张晖[3] 赵旭剑[2,4] LUO Yuanyuan;YANG Chunming;LI Bo;ZHANG Hui;ZHAO Xujian(School of Computer and Software,Chengdu Neusoft Institute of Information,Chengdu 611844,China;School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang 621000,China;School of Mathematics and Physics,Southwest University of Science and Technology,Mianyang 621000,China;Sichuan Big Data and Intelligent System Engineering Technology Research Center,Mianyang 621010,China)
机构地区:[1]成都东软学院计算机与软件学院,成都611844 [2]西南科技大学计算机科学与技术学院,四川绵阳621000 [3]西南科技大学数理学院,四川绵阳621000 [4]四川省大数据与智能系统工程技术研究中心,四川绵阳621010
出 处:《太原理工大学学报》2024年第1期204-213,共10页Journal of Taiyuan University of Technology
基 金:四川省科技厅重点研发项目(2021YFG0031);四川省省级科研院所科技成果转化项目(22YSZH0021)。
摘 要:【目的】事件抽取是构建高质量事件知识图谱的前提。临床事件抽取过程中事件元素存在依赖关系,现有方法无法准确识别事件元素并组合为事件,且现有临床事件标记数据较少,给事件抽取任务带来了极大的挑战。【方法】将临床事件抽取建模为实体识别模型,提出一种融合多特征的中文医学事件抽取方法:BERT-MCRF.该方法使用BERT构建模型的嵌入和特征提取部分,在CRF层加入多个字的滑动窗口特征,然后将BERT-MCRF作为半监督实验的基实验,提出一种高置信度伪标签数据选择算法作为筛选数据的条件,得到较高质量的300条数据与原始数据合并,最终构建了1700条语料,并重新训练模型。【结果】BERT-MCRF模型在3种属性实体上的整体F1值达到80.21%,比经典的BiLSTM-CRF模型提升15.11%;通过半监督思路重新训练的模型最终F1值达到81.56%,较原始BERT-MCRF提升了1.35%.【Purposes】Event extraction is a prerequisite for building high-quality event knowledge graphs.The dependency of event elements exists in the process of clinical event extraction.Existing methods fail to accurately identify event elements and combine them into events,and the amount of available clinical event tagging data is limited.These problems bring great challenges to the event extraction task.【Methods】In this research,clinical event is extracted and modelled as an entity recognition model,and a Chinese medical event extraction method incorporating multiple features is proposed:BERT-MCRF.In this method,Bidirectional Encoder Representation from Transformers(BERT)is used to construct the embedding and feature extraction parts of the model,multiple word sliding window features in the Conditional Random Fields(CRF)layer are added,then BERT-MCRF is used as a base experiment for semi-supervised experiments,and a high confidence pseudo-labeled data is proposed.The selection algorithm is used as a condition to filter the data,and 300 data of higher quality are obtained and merged with the original data.Fi-nally,1700 corpus are constructed and the model is retrained.【Findings】The overall F1 value of the BERT-MCRF model on the three attribute entities reaches 80.21%,which is 15.11%bet-ter than that of the classical Bi-directional Long Short Term Memory-Conditional Random Fields(BiLSTM-CRF)model;with the model retrained by the semi-supervised idea,the final F1 value reaches 81.56%,which is 1.35%higher than the original BERT-MCRF.
关 键 词:临床医学事件抽取 实体识别 多特征 半监督学习 高置信度伪标签选择算法
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15