基于SHAP可解释的慢性乙型病毒性肝炎患者肝硬化诊断模型构建与评估  

Construction and evaluation of a diagnostic model for liver cirrhosis in patients with chronic hepatitis B based on machine learning

在线阅读下载全文

作  者:张国顺 洪国议 吴童童 梅冬雪 辛英瑛 韩超 蒋美钰 王素颖 ZHANG Guoshun;HONG Guoyi;WU Tongtong;MEI Dongxue;XIN yingying;HAN Chao;JIANG Meiyu;WANG Suying(Department of Gastroenterology,Affiliated Hospital of North China University of Science and Technology,Tangshan 063000,China)

机构地区:[1]华北理工大学附属医院消化内科,河北省唐山市063000

出  处:《中国煤炭工业医学杂志》2024年第6期617-625,共9页Chinese Journal of Coal Industry Medicine

基  金:河北省医学科学研究课题项目(编号:20240401);中国肝炎防治基金会王宝恩肝纤维化研究基金(编号:2025030)

摘  要:目的 基于机器学习方法构建最优的慢性乙型病毒性肝炎患者患肝硬化的诊断评估模型,为临床工作者识别高风险个体,慢性乙型病毒性肝炎患者肝硬化的早期预防提供参考依据。方法 本研究回顾性收集了2020年1月—2024年1月在华北理工大学附属医院消化内科门诊及住院部治疗的患者420例,其中明确诊断为慢性乙型病毒性肝炎合并肝硬化的患者200例,慢性乙型病毒性肝炎未合并肝硬化的患者220例。所有患者按7:3的比例随机分为训练集和测试集。回顾性收集2019年1—12月在该院门诊及住院的150例慢性乙型病毒性肝炎患者及慢性乙型病毒性肝炎合并肝硬化患者,作为外部验证集。对肝硬化和未肝硬化两组患者的年龄、性别等一般资料和血常规、血生化等实验室检查资料进行比较分析,采用多因素logistic回归分析,确定慢性乙型病毒性肝炎合并肝硬化的独立影响因素。使用训练集研究对象构建XGBoost、logistic回归、LightGBM、KNN和SVM模型,采用验证集进行模型验证。通过受试者工作特征(ROC)曲线、曲线下面积(AUC)、准确率和F1分数等多项指标评估各模型性能,最终选出最优模型。采用Shapley Additive exPlanations(SHAP)方法,进行最优模型展示。结果 总胆红素[OR=1.046,95%CI:1.006~1.085]、血小板/红细胞分布宽度比值(RPR)[OR=1.417,95%CI:1.250~1.666]及HBV DNA载量[OR=15.855,95%CI:4.032~25.485]是慢性乙型肝炎患者发生肝硬化的独立危险因素(P<0.05);血红蛋白[OR=0.954,95%CI:0.927~0.978]和抗病毒治疗[OR=0.014, 95%CI:0.002~0.056]则是肝硬化形成的保护因素(P<0.05)。基于这些独立影响因素,分别构建了XGBoost、logistic回归、LightGBM、KNN和SVM模型,结果表明这些模型的AUC值均较高,其中logistic回归模型的预测效果最优。logistic回归模型的灵敏度、特异度、准确度、F1值和AUC分别为0.929、0.970、0.947、95.1%(95%CI:93.3%~96.9%)和0.988;�Objective To construct prediction models for liver cirrhosis in patients with chronic viral hepatitis B using machine learning methods,compare and discriminate their performance,and screen out the best performance model,so as to provide theoretical reference and clinical guidance for clinical workers to identify high-risk individuals and take targeted preventive measures.Methods This study retrospectively collected data from 200 patients diagnosed with chronic hepatitis B(CHB)complicated by cirrhosis and 220 patients diagnosed with chronic hepatitis B without cirrhosis at the Department of Gastroenterology,North China University of Science and Technology Affiliated Hospital,between January 2020 and January 2024.The cirrhosis group consisted of 200 patients,while the non-cirrhosis group consisted of 220 patients.Both groups were randomly split into training and testing sets at a 7:3 ratio.The clinical and laboratory data of both groups were analyzed statistically.Initially,univariate analysis was performed to identify factors with statistical significance,followed by multivariate logistic regression analysis to determine independent risk factors for cirrhosis in patients with chronic hepatitis B.Various machine learning models,including XGBoost,logistic regression,LightGBM,KNN,and SVM,were constructed using the training set.The performance of these models was evaluated based on Receiver Operating Characteristic curves,Area Under Curve,accuracy,and F1 score,with the best-performing model selected for further analysis.To assess the generalization ability of the model,calibration curves and decision curves were used.To improve the robustness of the models,the study employed the bootstrap method for internal validation.In addition,a separate external validation set consisting of 150 patients diagnosed with chronic hepatitis B and chronic hepatitis B with cirrhosis from the hospital during January to December 2019 was retrospectively collected.The optimal model's performance was evaluated in the external validation set

关 键 词:慢性乙型病毒性肝炎 肝硬化 机器学习 预测模型 

分 类 号:R575.2[医药卫生—消化系统]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象