基于XGBoost的COVID-19患者重症风险早期预测模型的建立与评价被引量：7

Early prediction model for disease progression of COVID-19 patients based on XGBoost:establishment and evaluation

作　　者：王铭程振豪胡苗唐铭成徐福民王莉[2] 粘永健[2] 刘凯军 WANG Ming;CHENG Zhenhao;HU Miao;TANG Mingcheng;XU Fumin;WANG Li;NIAN Yongjian;LIU Kaijun(Department of Gastroenterology,Army Medical Center of PLA,Chongqing,400042;Students’Team 5,College of Basic Medical Sciences,Army Medical University(Third Military Medical University),Chongqing,400038;Faculty of Biomedical Engineering and Imaging Medicine,Army Medical University(Third Military)Medical University,Chongqing,400038;First Department of Infectious Diseases,Wuhan Huoshenshan Hospital,Wuhan,Hubei Province,430010,China)

机构地区：[1]陆军特色医学中心消化内科,重庆400042 [2]陆军军医大学(第三军医大学)生物医学工程与影像医学系,重庆400038 [3]陆军军医大学(第三军医大学)基础医学院学员五大队,重庆400038 [4]武汉市火神山医院感染一科,武汉430010

出　　处：《陆军军医大学学报》2022年第3期195-202,共8页Journal of Army Medical University

摘　　要：目的利用新型冠状病毒病(corona virus disease 2019,COVID-19)患者的临床特征数据构建XGBoost预测模型,并评价预测模型对COVID-19患者重症进展风险早期预测的效能。方法对2020年2月10日至4月5日火神山医院病案系统内经实验室确诊的COVID-19患者进行筛选,共收集347例有完整医疗信息和实验室检查结果的患者数据。首先筛选出21个具有显著性差异的指标作为训练模型的输入特征;对构建的XGBoost模型进行贝叶斯优化以调整参数,并基于特征重要性筛选出最优特征组合;进一步分析各特征数值大小对预测结果的正负影响,利用SHAP(SHapley Additive exPlanation)对各特征重要性进行量化和归因;对XGBoost预测模型进行性能评价,并将其与其他机器学习方法进行对比,讨论其优势所在。结果本研究选取21个重症组与非重症组差异显著的特征进行训练和验证。在K最邻近(k-nearest neighbor,KNN)模型中具有10个特征的最优子集获得了验证集中4个模型中曲线下面积(area under curve,AUC)值的最高值。年龄、脉率、白细胞计数、中性粒细胞计数、C-反应蛋白、总胆红素、肌酐、D-二聚体(D-Dimer)越高,疾病重症风险越高;淋巴细胞计数、白蛋白水平越低,疾病重症风险越高。XGBoost与支持向量机的预测性能优于其他机器学习方法(在测试集上AUC值分别为0.9420、0.9594),其中XGBoost训练速度明显更优。结论基于XGBoost成功建立了预测模型,以最优特征子集实现了对COVID-19患者重症进展风险的早期预测。Objective To construct an XGBoost prediction model to predict disease severity of COVID-19 based on clinical characteristics dataset of COVID-19 patients.Methods A total of 347 laboratory-confirmed COVID-19 patients with complete medical information admitted from Feb 10 to April 5,2020 were screened from the medical record system of Huoshenshan Hospital.Firstly,21 features with significant differences were screened out as input features for the training model.Bayesian optimization was performed on the constructed XGBoost model to adjust the parameters,and the optimal combination of features was filtered based on feature importance.To further analyze the positive and negative effects of the numerical size of each feature on the prediction results,each feature importance was quantified and attributed by using SHapley Additive exPlanations(SHAP).Finally,the performance of the XGBoost prediction model was evaluated,and the model was compared and discussed with other machine learning methods,including support vector machine(SVM),na6 ve Bayes(NB),logical regression(LR),and k-nearest neighbors(KNN).Results In this study,21 features with significant differences between the severe and non-severe groups were selected for training and validation.The optimal subset with 10 features in the k-nearest neighbor model obtained the highest value of area under curve(AUC)among the 4 models in the validation set.XGBoost and support vector machine were better than other machine learning methods in terms of prediction performance(AUC:0.9420,and 0.9594 on the test set,respectively),and the training speed of XGBoost was significantly faster.Conclusion A prediction model based on XGBoost is successfully built to achieve early prediction of disease severity of COVID-19 patients.

关键词：COVID-19 重症风险预测模型 XGBoost SHAP

分类号：R319[医药卫生—基础医学] R512.99

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于XGBoost的COVID-19患者重症风险早期预测模型的建立与评价被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于XGBoost的COVID-19患者重症风险早期预测模型的建立与评价 被引量：7

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于XGBoost的COVID-19患者重症风险早期预测模型的建立与评价被引量：7