基于提示学习的医学量表问题文本多分类研究  被引量:2

A study on multi-class classification of medical questionnaire item texts based on prompt learning

在线阅读下载全文

作  者:郝洁 彭庆龙 丛山 李姣[1] 孙海霞[1] HAO Jie;PENG Qinglong;CONG Shan;LI Jiao;SUN Haixia(Institute of Medical Information,Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing 100020,P.R.China;Qingdao Innovation Development Base,Harbin Engineering University,Qingdao 266000,P.R.China)

机构地区:[1]中国医学科学院/北京协和医学院医学信息研究所,北京100020 [2]哈尔滨工程大学青岛创新发展基地,山东青岛266000

出  处:《中国循证医学杂志》2024年第1期76-82,共7页Chinese Journal of Evidence-based Medicine

基  金:国家社会科学基金项目(编号:21BTQ069);中国医学科学院医学与健康科技创新工程项目(编号:2021-I2M-1-056);国家重点研发计划(编号:2022YFC3601005)。

摘  要:目的 目前医学量表资源的加工与组织多集中在文档层面,不利于用户从条目层面进行检索与复用。本文旨在提出一种低资源场景下的医学量表条目多分类方法,支持细粒度医学量表资源组织与服务。方法 采用一种基于预训练语言模型BERT的提示学习分类方法来实现医学量表条目文本的多分类。首先收集肺癌临床评估量表,提取功能、领域分类标签,采用人工标注“功能-领域”组合标签形成肺癌临床评估条目小样本语料集;然后采用提示学习方法,通过将自定义构建的模板格式输入BERT模型,对模板空缺位置进行预测填充;最后将填充文本映射到标签,实现对医学量表中条目文本的多分类。结果 构建的语料包含肺癌临床评估条目347条,涉及“功能-领域”分类标签9个;在自制的语料集上,提出的多分类方法的平均准确率达到93%,比次优的GAN-BERT模型性能提高约6%。结论 基于预训练语言模型BERT的提示学习分类方法能够在减少医学量表条目语料构建成本的同时保持较优的性能,在医学量表条目分类研究与实践中具有推广价值。Objective The current medical questionnaire resources are mainly processed and organized at the document level,which hampers user access and reuse at the questionnaire item level.This study aims to propose a multiclass classification of items in medical questionnaires in low-resource scenarios,and to support fine-grained organization and provision of medical questionnaires resources.Methods We introduced a novel,BERT-based,prompt learning approach for multi-class classification of items in medical questionnaires.First,we curated a small corpus of lung cancer medical assessment items by collecting relevant clinical assessment questionnaires,extracting function and domain classifications,and manually annotating the items with"function-domain"combination labels.We then employed prompt learning by feeding the customized template into BERT.The masked positions were predicted and filled,followed by mapping the populated text to labels.This process enables the multi-class classification of item texts in medical questionnaires.Results The constructed corpus comprised 347 clinical assessment items for lung cancer,across nine"function-domain"labels.The experimental results indicated that the proposed method achieved an average accuracy of 93%on our self-constructed dataset,outperforming the runner-up GAN-BERT by approximately 6%.Conclusion The proposed method can maintain robust performance while minimizing the cost of building medical questionnaire item corpora,illustrating its promotion value of research and practice in medical questionnaire classification.

关 键 词:医学量表 问题分类 多分类 提示学习 预训练语言模型 

分 类 号:R-05[医药卫生] TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象