机构地区:[1]湖南科技大学数学与计算科学学院,湖南湘潭411201 [2]中国中医科学院中医临床基础医学研究所,北京100700 [3]中国中医科学院中医药信息研究所,北京100700
出 处:《中国实验方剂学杂志》2024年第24期167-173,共7页Chinese Journal of Experimental Traditional Medical Formulae
基 金:国家重点研发计划项目(2023YFC3503404);中国中医科学院自主选题项目(Z0643)。
摘 要:目的:提高医案文本中命名实体的识别准确率,实现对医案知识的有效挖掘和利用,针对医案文本特点,构建一种Bert-Radical-Lexicon(BRL)神经网络模型识别医案实体。方法:从《中华历代名医医案全库》中选取408篇与高血压病相关的医案,并通过人工标注构建一个包含1672条医案语料的数据集。随后,将这些语料随机分为3个子集,即训练集(1004条)、测试集(334条)和验证集(334条)。以此为基础,构建融合多种医案文本特征的BRL模型,及其变体模型BRL-B、BRL-L、BRL-R,以及一个基线模型Base。在模型训练阶段,利用训练集对上述模型进行训练,为了减少过拟合的风险,在训练过程中持续监控各模型在验证集上的表现,并保存效果最优的模型。最后,在测试集上评估这些模型的性能。结果:与其他模型比较,BRL模型在医案命名实体识别任务中的性能最优,对疾病、症状、舌象、脉象、证候、治法、方剂及中药共8类实体的整体识别精确率为90.09%,召回率为90.61%,精确率与召回率的调和平均数(F1)为90.35%。BRL模型较Base模型,对实体识别的整体F1提升了5.22%,其中对脉象实体F1提升了6.92%,提升幅度最大。结论:通过在嵌入层融入多种医案文本特征,BRL神经网络模型具有更强的命名实体识别能力,进而提取更准确可靠的中医临床信息。Objective:In order to improve the recognition accuracy of named entities in medical record texts and realize the effective mining and utilization of medical record knowledge,a Bert-Radical-Lexicon(BRL)neural network model is constructed to recognize medical record entities with respect to the characteristics of medical record texts.Method:We selected 408 medical records related to hypertension from the the Complete Library of Famous Medical Records of Chinese Dynasties and constructed a dataset consisting of 1672 medical records by manually labeling.Then,we randomly divided the dataset into three subsets,including the training set(1004 cases),the testing set(334 cases)and the validation set(334 cases).Based on this dataset,we built a BRL model that fused various text features of medical records,as well as its variants BRL-B,BRL-L and BRL-R,and a baseline model Base for experiments.During the model training phase,we trained the above models using the training set to reduce the risk of overfitting.We continuously monitored the performance of each model on the validation set during training and saved the model with the best performance.Finally,we evaluated the performance of these models on the testing set.Result:Compared with other models,the BRL model had the best performance in the medical records named entity recognition task,with an overall recognition precision of 90.09%,a recall of 90.61%,and the harmonic mean of the precision and recall(F1)of 90.35%for eight types of entities,including disease,symptom,tongue manifestation,pulse condition,syndrome,method of treatment,prescription and traditional Chinese medicine(TCM).Compared with the Base model,the BRL model improved the overall F1 value of entity recognition by 5.22%,and the F1 value of pulse condition entity increased by 6.92%,which was the largest increase.Conclusion:By incorporating a variety of medical record text features in the embedding layer,the BRL neural network model has stronger named entity recognition ability,and thus extracts more accurate an
关 键 词:命名实体识别 预训练模型 部首嵌入 关联词嵌入 名家医案
分 类 号:R22[医药卫生—中医基础理论] R28[医药卫生—中医学] R249[自动化与计算机技术—控制理论与控制工程] TP183[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...