机器学习模型预测糖尿病前期发生风险  

Machine Learning Models Predict Risk of Pre-Diabetes Occurrence

在线阅读下载全文

作  者:杨芊芊 李红 尹高军[3] 马宁 麦布拜穆·艾斯卡尔 蔡慧珍[1] YANG Qianqian;LI Hong;YIN Gaojun;MA Ning;Maibubaimu Aisikaer;CAI Huizhen(Ningxia Medical University,Yinchuan 750004,China;Ningxia Vocational and Technical College,Yinchuan 750004,China;General Hospital of Ningxia Medical University Health Management Center,The First Clinical Medical College of Ningxia Medical University,Yinchuan 750004,China)

机构地区:[1]宁夏医科大学,银川750004 [2]宁夏职业技术学院,银川750004 [3]宁夏医科大学总医院健康管理中心,宁夏医科大学第一临床医学院,银川750004

出  处:《宁夏医科大学学报》2025年第1期82-88,共7页Journal of Ningxia Medical University

基  金:国家自然科学基金地区项目(82360645);宁夏回族自治区教育厅高等学校科学研究项目(NYG2024298)。

摘  要:目的 利用简单易得的体检数据,基于机器学习算法构建精准、高灵敏度的糖尿病前期风险预测模型,并对最优模型构建列线图进行可视化,为早期识别糖尿病相关危险因素提供理论依据。方法 选取2019年9月至2020年9月,宁夏医科大学总医院健康管理中心筛检的76 618例非糖尿病患者的体检资料,非糖尿病前期人群为正类样本,糖尿病前期人群为负类样本,通过欠采样技术,调整正负样本比例为1∶1,最终选择6 153例研究对象。利用Lasso回归筛选特征变量,使用Logistic回归、支持向量机(SVM)、随机森林(RF)、极端梯度提升(XGBoost)模型和人工神经网络机器学习算法构建预测模型。选择灵敏度、特异度、准确度、F1值和曲线下面积(AUC)作为评价指标,进行变量重要性评估利用模型构建列线图。结果 Lasso回归筛选出年龄、身体质量指数(BMI)、舒张压、收缩压、甘油三酯等11个特征。Logistic回归模型和XGBoost模型的表现较优秀,Logistic回归模型的准确度、灵敏度、特异度、F1值和AUC分别是0.749 2、0.723 7、0.774 6、0.756 0、0.825 6,XGBoost模型的准确度、灵敏度、特异度、F 1值和AUC分别是0.751 3、0.809 8、0.692 5、0.765 7、0.828 8。XGBoost模型SHAP特征重要排序前5名分别是年龄、甘油三酯、天门冬/丙氨酸、收缩压和BMI。列线图对于糖尿病前期发生风险的预测与实际风险之间吻合良好。结论 与现有糖尿病前期筛查模型相比,Logistic回归模型和XGBoost模型对糖尿病前期人群筛选能力更强,列线图可直观清晰地预测糖尿病前期发生风险。Objective To construct an accurate and highly sensitive prediction model for pre-diabetes risk based on machine learning algorithms using simple and easily available physical examination data and visualize the optimal model construction column line graphs,so as to provide a theoretical basis for the early identification of diabetes-related risk factors.Methods The physical examination data of 76618 non-diabetic patients screened at the health management center of a hospital affiliated to Ningxia Medical University during the period from September 2019 to September 2020 were selected,with the non-pre-diabetic population as the positive class sample and the pre-diabetic population as the negative class sample,and the ratio of positive and negative samples was adjusted to 1∶1 through the under-sampling technique,and 6153 study subjects were finally selected.Lasso regression was utilized to screen the characteristic variables,and Logistic regression,Support Vector Machine(SVM),Random Forest(RF),Extreme Gradient Boosting(XGBoost)model,and Artificial Neural Network Machine Learning Algorithm were used to construct the prediction model.Sensitivity,specificity,accuracy,F1 value and area under the curve(AUC)value of the subjects were selected as evaluation indexes,and the importance of variables was assessed to construct a column-line diagram using the optimal model.Results Lasso regression screened 11 characteristics such as age,body mass index(BMI),diastolic blood pressure,systolic blood pressure,triacylglycerol,etc.Logistic regression model and XGBoost model performed better,the accuracy,sensitivity,specificity,F1 value,and AUC value of Logistic regression model were 0.7492,0.7237,0.7746,0.7560,0.8256,and the accuracy,sensitivity,specificity,F1 value and AUC value of the XGBoost model were 0.7513,0.8098,0.6925,0.7657,0.8288,respectively.The top five rankings of importance of the SHAP features of the XGBoost model were age,triacylglycerol,aspartate/alanine,systolic blood pressure and BMI,respectively.The predictions o

关 键 词:糖尿病前期 机器学习 预测 列线图 

分 类 号:R589.1[医药卫生—内分泌]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象