基于机器学习优化的血液指标肺结核诊断模型:多中心研究  

Blood-based diagnostic model for pulmonary tuberculosis optimized by machine learning:A multicenter study

在线阅读下载全文

作  者:周靖 丁寿鹏 蔡子涵 ZHOU Jing;DING Shoupeng;CAI Zihan(Department of Laboratory,Siyang Hospital,Siyang 223700;Department of Laboratory,Gutian County Hospital,Ningde 352200,China)

机构地区:[1]泗阳医院检验科,泗阳223700 [2]古田县医院检验科,宁德352200

出  处:《临床与病理杂志》2025年第1期25-37,共13页Journal of Clinical and Pathological Research

基  金:泗阳医院与江苏大学附属医院第一届院内科研计划项目(2024SY005)。

摘  要:目的:肺结核是一种全球范围内严重的传染性疾病,准确、快速的诊断对于减少传播和优化治疗至关重要。现有诊断方法在资源有限地区推广受限且成本较高,因此开发基于血液指标的经济、高效的诊断工具具有重要意义。方法:本研究纳入121例肺结核患者(结核组)和101例健康对照者(健康对照组),通过对血液指标的统计分析,筛选出与肺结核显著相关的特征。采用梯度提升决策树(eXtreme Gradient Boosting,XGBoost)、支持向量机递归特征消除(Support Vector Machine Recursive Feature Elimination,SVM-RFE)和森林之神(Boruta)3种机器学习算法进行特征筛选,并利用筛选出的特征构建多种机器学习模型。随后,通过沙普利加性解释(ShapleyAdditive Explanations,SHAP)方法对模型特征变量的重要性及贡献进行解释,进一步分析特征的作用机制及对分类性能的影响。结果:结核组多项血液指标与健康对照组之间存在显著差异,其中包括淋巴细胞百分比(lymphocyte percentage,LYM%)、嗜酸性粒细胞百分比(eosinophil percentage,EOS%)、天冬氨酸氨基转移酶(aspartate aminotransferase,AST)、嗜酸性粒细胞绝对值(eosinophil absolute count,EOS#)和中性粒细胞绝对值(neutrophil absolute count,NEU#)等。XGBoost筛选出34个重要特征,SVM-RFE在包含5个特征时性能最佳,而Boruta筛选出15个显著特征。3种机器学习算法的交集包含5个核心特征(LYM%、EOS%、AST、EOS#、NEU#)。在模型构建中,XGBoost在训练组、验证组和外部验证组上的受试者操作特征曲线的曲线下面积分别为0.989、0.975和0.969,验证集正确分类率达94%,表现出最优的性能。SHAP分析进一步验证LYM%对模型预测具有显著的正向贡献,而AST和EOS#具有负向贡献,同时发现特征间有显著的交互作用。结论:本研究通过整合血液指标和机器学习算法成功构建了一种高效的肺结核诊断模型,具有高准确性和良好的泛化能�Objective:Pulmonary tuberculosis is a serious infectious disease worldwide,and accurate and rapid diagnosis is crucial for reducing transmission and optimizing treatment.Existing diagnostic methods are limited in resource-constrained regions due to high costs,making the development of an economical and efficient blood-based diagnostic tool highly significant.Methods:This study included 121 tuberculosis patients(tuberculosis group)and 101 healthy controls(healthy control group).Blood indicators were analyzed statistically to identify features significantly associated with tuberculosis.Three machine learning algorithms,eXtreme Gradient Boosting(XGBoost),Support Vector Machine Recursive Feature Elimination(SVM-RFE),and Boruta,were used for feature selection.Various machine learning models were constructed using the selected features.The Shapley Additive Explanations(SHAP)method was used to explain the importance and contribution of model features,further analyzing the mechanism of action of features and their impact on classification performance.Results:Several blood indicators significant differed between the tuberculosis group and the healthy control group,including lymphocyte percentage(LYM%),eosinophil percentage(EOS%),aspartate aminotransferase(AST),eosinophil percentage(EOS#),and neutrophil absolute count(NEU#).XGBoost selected 34 key features,SVM-RFE performed best with 5 features,and Boruta identified 15 significant features.The intersection of the 3 methods contained 5 core features(LYM%,EOS%,AST,EOS#,NEU#).In model development,XGBoost achieved areas under the receiver operating characteristic curve of 0.989,0.975,and 0.969 for the training,validation,and external validation groups,respectively,with a validation accuracy rate of 94%,showing optimal performance.SHAP analysis further confirmed that LYM%made a significant positive contribution to model,while AST and EOS#had negative contributions,and significant interactions between features and observed.Conclusion:This study successfully developed an efficien

关 键 词:机器学习 血液指标 诊断模型 肺结核 治疗优化 

分 类 号:R521[医药卫生—内科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象