基于迁移学习和集成学习的医疗文本分类被引量：1

Medical Text Classification Based on Transfer Learning and Ensemble Learning

作　　者：郑承宇王新[1] 王婷徐权峰 ZHENG Cheng-yu;WANG Xin;WANG Ting;XU Quan-feng(School of Mathematics and Computer Science,Yunnan Minzu University,Kunming 650500,China)

机构地区：[1]云南民族大学数学与计算机科学学院,云南昆明650500

出　　处：《计算机技术与发展》2022年第4期28-33,共6页Computer Technology and Development

基　　金：国家自然科学基金资助项目(61363022);云南省教育厅科学研究基金项目(2021Y670)。

摘　　要：针对医疗文本语义稀疏、维度过高的问题,提出一种基于迁移学习和集成学习的多标签医疗文本分类算法(Trans-LSTM-CNN-Multi,TLCM)。该算法采用ALBERT(A Lite BERT)模型内部的多层双向Transfomer结构对大型语料库展开训练,获取通用领域的文本动态字向量表示。然后,利用医学领域目标数据集通过迁移学习和模型微调技术实现ALBERT预训练语言模型在医学领域的文本语义增强。在此基础上,将上述通过迁移学习得到的文本语义增强模型输入到Bi-LSTM-CNN集成学习模块,进一步提取医学文本内容的重要信息特征。最后,基于二元交叉熵损失函数构造文本多标签分类器实现医疗文本分类。实验结果表明,通过迁移学习和集成学习的TLCM文本分类算法能有效提升医疗文本的分类性能,在中文健康问句数据集上整体F1值达到了91.8%。Aiming at the problems of sparse semantic and high dimension of medical text,a multi-label medical text classification algorithm based on transfer learning and ensemble learning named TLCM(Trans-LSTM-CNN-Multi) is proposed.Firstly,the large-scale corpus is trained through the multi-layer Transfomer structure inside the ALBERT(A Lite BERT) model to obtain the dynamic word vector representation of the text.Then,the target data set in the medical field is used to realize the text semantic enhancement in the medical field through transfer learning and model fine-tuning technology based on ALBERT(A Lite BERT) pre-training language model.On this basis,the above-mentioned semantic enhancement model obtained through transfer learning is input to the Bi-LSTM-CNN ensemble learning module to further extract important information characteristics of medical text content.Finally,a text multi-label classifier based on binary cross entropy loss function is constructed to achieve medical text classification.The experimental results show that the text classification algorithm through transfer learning and ensemble learning can effectively improve the overall performance of the model,and finally the overall F1 value on the Chinese health question data set reaches 91.8%.

关键词：迁移学习集成学习 ALBERT Bi-LSTM-CNN 医疗文本健康问句

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于迁移学习和集成学习的医疗文本分类被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于迁移学习和集成学习的医疗文本分类 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于迁移学习和集成学习的医疗文本分类被引量：1