基于ChatGPT和零样本提示的临床量表文本中结构化项目信息抽取研究  

ChatGPT and Zero-Shot Prompt-based Structured Item Information Extraction from Clinical Scale Text

在线阅读下载全文

作  者:郝洁 莫治强 孙海霞[1] 陈振丽 李姣[1] Hao Jie;Mo Zhiqiang;Sun Haixia;Chen Zhenli;Li Jiao(Institute of Medical Information,Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing 100020;Department of Computer Science,University of Science and Technology of China,Hefei 230027)

机构地区:[1]中国医学科学院/北京协和医学院医学信息研究所,北京100020 [2]中国科学技术大学计算机系,合肥230027

出  处:《图书情报工作》2024年第22期139-152,共14页Library and Information Service

基  金:国家社会科学基金项目“基于知识组织的量表资源语义互联研究”(项目编号:21BTQ069);中国医学科学院医学与健康科技创新工程项目“医学知识管理与智能化知识服务关键技术研究”(项目编号:2021-I2M-1-056)研究成果之一。

摘  要:[目的/意义]实现无标注数据情况下使用Chat GPT从自由临床量表文本中抽取结构化的项目信息,高效推进医学量表资源的结构化与智能化。[方法/过程]定义包含8类属性、兼顾临床量表测量概念结构差异性的项目信息抽取框架,收集59个临床常用心理评定量表文档自建数据集;分类设计零样本提示,调用Chat GPT-3.5和Chat GPT-4官方接口进行实验;多角度分析Chat GPT不同版本在处理不同临床量表文本时的抽取表现和可能影响因素。[结果/结论]研究结果表明,来源属性抽取表现最佳,Micro-F1和Macro-F1最低也分别达98.90%和97.83%;反应选项、使用说明和计分规则随后;编号和项目指令居中;临床解释最低,Micro-F1和Macro-F1分别为47.73%和45.51%。Chat GPT-4整体表现更优,但部分属性召回率弱于Chat GPT-3.5。量表测量概念层级、维度数、项目数和文本长度的增加会降低模型表现。综上,Chat GPT能够高效辅助医学量表资源的结构化,尤其是在处理简单量表时。[Purpose/Significance]This study aims to extract structured item information from free-text clin ical scales using ChatGPT without annotations,which efficiently advances the structuring and intellectualization of medical scale resources.[Method/Process]A framework for item information extraction was defined,including eight attribute types and considering the structural differences in clinical scale measurement concepts.A dataset was constructed by collecting 59 commonly used clinical psychometric assessment scale documents.Zero-shot prompt templates were designed based on measurement concept levels,and experiments were conducted using the official ChatGPT-3.5 and ChatGPT-4 interfaces.The extraction performance and possible influencing factors of different ChatGPT versions in processing different clinical scale texts were analyzed from multiple perspectives.[Result/Conclusion]The extraction performance for scale item sources is the best,with Micro-F1 and Macro-F1 scores of at least 98.90%and 97.83%,respectively.This is followed by response options,instructional guidance,and scoring rules,with item numbers and instructions showing moderate performance.Clinical explanations have the lowest performance,with Micro-F1 and Macro-F1 scores of 47.73%and 45.51%,respectively.ChatGPT-4 performs better overall,but the recall rate of some attributes is weaker than that of ChatGPT-3.5.The increase in measurement con cept levels,dimensionality,number of items,and text length are found to reduce model performance.In summary,ChatGPT can efficiently assist in the structuring of medical scale resources,especially when dealing with simple scales.

关 键 词:医学量表 文本结构化 属性抽取 大语言模型 零样本学习 

分 类 号:P391[天文地球—地球物理学] G255[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象