基于大语言模型的小样本医学命名实体识别方法研究  

Research on Large Language Model-based Few-shot Medical Named Entity Recognition

在线阅读下载全文

作  者:赵从朴[1] 朱卫国[1] 赵飞 郭安辉 ZHAO Congpu;ZHU Weiguo;ZHAO Fei;GUO Anhui(Peking Union Medical College Hospital of the Chinese Academy of Medical Sciences,Beijing 100730,China)

机构地区:[1]中国医学科学院北京协和医院,北京市100730 [2]国家卫生健康委统计信息中心,北京市100810

出  处:《中国卫生信息管理杂志》2024年第6期902-908,914,共8页Chinese Journal of Health Informatics and Management

基  金:中国医学科学院医学与健康科技创新工程项目“医学知识管理与智能化知识服务关键技术研究”(2021-1-I2M-056)。

摘  要:目的利用大语言模型实现小样本医学命名实体识别。方法将医学命名实体识别任务转换为文本生成任务,构造医学命名实体识别特定的提示模板;利用大语言模型在文本生成的过程中生成医学实体的标签序列,从医学文本语料中检索少量相似标注数据作为示例,结合语境学习,从而实现小样本场景下的医学命名实体识别。结果实验结果显示,采用本方法准确率、召回率和F1值分别达到了50.54%、47.12%和48.77%,均显著优于传统的机器学习算法和深度学习算法;合理使用多条样本作为示例可以进一步提升模型预测性能。结论本文提出的方法不仅不需要对模型进行参数更新,而且几乎不依赖于数据标注,提升了方法的泛化能力。Objective To use a large language model to achieve small sample medical named entity recognition.Methods Convert the medical named entity recognition task into a text generation task,construct a specific prompt template for medical named entity recognition,and enable the large language model to generate a sequence of medical entity labels during the text generation process.Retrieve a small amount of similar labeled data from medical text corpus as an example,combined with contextual learning,to achieve medical named entity recognition in small sample scenarios.Results The proposed method in this paper achieved accuracy,recall,and F1 scores of 50.54%,47.12%,and 48.77%,respectively,all of which are significantly higher than those obtained by traditional machine learning algorithms and deep learning algorithms.The reasonable use of multiple samples as examples can further enhance the model's predictive performance.Conclusion The method proposed in this paper not only does not need to update the parameters of the model,but also almost does not rely on data annotation,which improves the generalization ability of the method.

关 键 词:大语言模型 医学命名实体识别 小样本 

分 类 号:R-039[医药卫生] R319

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象