检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王铭 程振豪 胡苗 唐铭成 徐福民 王莉[2] 粘永健[2] 刘凯军 WANG Ming;CHENG Zhenhao;HU Miao;TANG Mingcheng;XU Fumin;WANG Li;NIAN Yongjian;LIU Kaijun(Department of Gastroenterology,Army Medical Center of PLA,Chongqing,400042;Students’Team 5,College of Basic Medical Sciences,Army Medical University(Third Military Medical University),Chongqing,400038;Faculty of Biomedical Engineering and Imaging Medicine,Army Medical University(Third Military)Medical University,Chongqing,400038;First Department of Infectious Diseases,Wuhan Huoshenshan Hospital,Wuhan,Hubei Province,430010,China)
机构地区:[1]陆军特色医学中心消化内科,重庆400042 [2]陆军军医大学(第三军医大学)生物医学工程与影像医学系,重庆400038 [3]陆军军医大学(第三军医大学)基础医学院学员五大队,重庆400038 [4]武汉市火神山医院感染一科,武汉430010
出 处:《陆军军医大学学报》2022年第3期195-202,共8页Journal of Army Medical University
摘 要:目的利用新型冠状病毒病(corona virus disease 2019,COVID-19)患者的临床特征数据构建XGBoost预测模型,并评价预测模型对COVID-19患者重症进展风险早期预测的效能。方法对2020年2月10日至4月5日火神山医院病案系统内经实验室确诊的COVID-19患者进行筛选,共收集347例有完整医疗信息和实验室检查结果的患者数据。首先筛选出21个具有显著性差异的指标作为训练模型的输入特征;对构建的XGBoost模型进行贝叶斯优化以调整参数,并基于特征重要性筛选出最优特征组合;进一步分析各特征数值大小对预测结果的正负影响,利用SHAP(SHapley Additive exPlanation)对各特征重要性进行量化和归因;对XGBoost预测模型进行性能评价,并将其与其他机器学习方法进行对比,讨论其优势所在。结果本研究选取21个重症组与非重症组差异显著的特征进行训练和验证。在K最邻近(k-nearest neighbor,KNN)模型中具有10个特征的最优子集获得了验证集中4个模型中曲线下面积(area under curve,AUC)值的最高值。年龄、脉率、白细胞计数、中性粒细胞计数、C-反应蛋白、总胆红素、肌酐、D-二聚体(D-Dimer)越高,疾病重症风险越高;淋巴细胞计数、白蛋白水平越低,疾病重症风险越高。XGBoost与支持向量机的预测性能优于其他机器学习方法(在测试集上AUC值分别为0.9420、0.9594),其中XGBoost训练速度明显更优。结论基于XGBoost成功建立了预测模型,以最优特征子集实现了对COVID-19患者重症进展风险的早期预测。Objective To construct an XGBoost prediction model to predict disease severity of COVID-19 based on clinical characteristics dataset of COVID-19 patients.Methods A total of 347 laboratory-confirmed COVID-19 patients with complete medical information admitted from Feb 10 to April 5,2020 were screened from the medical record system of Huoshenshan Hospital.Firstly,21 features with significant differences were screened out as input features for the training model.Bayesian optimization was performed on the constructed XGBoost model to adjust the parameters,and the optimal combination of features was filtered based on feature importance.To further analyze the positive and negative effects of the numerical size of each feature on the prediction results,each feature importance was quantified and attributed by using SHapley Additive exPlanations(SHAP).Finally,the performance of the XGBoost prediction model was evaluated,and the model was compared and discussed with other machine learning methods,including support vector machine(SVM),na6 ve Bayes(NB),logical regression(LR),and k-nearest neighbors(KNN).Results In this study,21 features with significant differences between the severe and non-severe groups were selected for training and validation.The optimal subset with 10 features in the k-nearest neighbor model obtained the highest value of area under curve(AUC)among the 4 models in the validation set.XGBoost and support vector machine were better than other machine learning methods in terms of prediction performance(AUC:0.9420,and 0.9594 on the test set,respectively),and the training speed of XGBoost was significantly faster.Conclusion A prediction model based on XGBoost is successfully built to achieve early prediction of disease severity of COVID-19 patients.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200