检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王萍 张乐 洪小瑞 朱素玲 赵学靖 WANG Ping;ZHANG Le;HONG Xiaorui;ZHU Suling;ZHAO Xuejing(Department of Epidemiology and Health Statistics,School of Public Health,Lanzhou University,Lanzhou 730000,China;Institute of Probability and Statistics,School of Mathematics and Statistics,Lanzhou University,Lanzhou 730000,China)
机构地区:[1]兰州大学公共卫生学院流行病与卫生统计研究所,兰州730000 [2]兰州大学数学与统计学院概率统计研究所,兰州730000
出 处:《中华疾病控制杂志》2024年第9期1005-1009,共5页Chinese Journal of Disease Control & Prevention
基 金:国家自然科学基金(11971214)。
摘 要:目的利用重采样算法提高糖尿病患者血糖控制分类模型的预测性能。方法对中国健康与养老追踪调查(China health and retirement longitudinal study,CHARLS)数据库中糖尿病患者血糖控制不平衡数据进行重采样,比较重采样前后logistic回归(logistic regression,LR)、支持向量机(support vector machines,SVM)和随机森林(random forest,RF)的分类性能,利用分层五折交叉验证和受试者工作特征(receiver operating characteristic,ROC)曲线下面积(area under curve,AUC)确定模型的最优参数,以准确率、灵敏度、特异度、精确率、几何均值(geometric mean,G-mean)、F1分数和AUC为评价指标,比较重采样前后分类模型的性能。结果几种重采样算法均可提高3种分类模型的灵敏度、G-mean和F1分数;重采样算法过采样(adaptive synthetic sampling,ADASYN)、组合采样[合成少数类过采样技术和编辑最近邻(synthetic minority over-sampling technique and edited nearest neighbors,SMOTE-ENN);合成少数类过采样技术和Tomek链接(synthetic minority over-sampling technique tomek,SMOTE-Tomek)]对3种分类模型的AUC值均有不同程度的提高,其中ADASYN使LR分类模型的AUC值提高2.13%,SMOTE-ENN使LR分类模型的AUC值提高3.05%,SMOTE-Tomek使RF分类模型的AUC值提高2.13%。结论ADASYN、SMOTE-ENN、SMOTE-Tomek能较好地处理糖尿病患者血糖控制不平衡数据的问题,提高糖尿病患者血糖控制分类模型的预测性能。Objective This study aims to improve the prediction performance of blood glucose control classification models for diabetic patients by employing resampling algorithms.Methods Blood glucose control data of diabetic patients in the China health and retirement longitudinal study(CHARLS)database were resampled.We compared the classification performance of logistic regression(LR),support vector machines(SVM),and random forests(RF)before and after resampling.We utilized stratified 5-fold cross-validation and area under curve(AUC)to determine the optimal parameters of the models.The performance of the classification models before and after resampling was evaluted using metrics such as accuracy,sensitivity,specificity,precision,geometric mean(G-mean),F1 score,and AUC.Results All three resampling algorithms,including ADASYN,synthetic minority over-sampling technique and edited nearest neighbors(SMOTE-ENN),and synthetic minority over-sampling technique tomek(SMOTE-Tomek),enhanced the prediction performance of three classification models when dealing with imbalanced blood glucose control data in diabetic patients.These algorithms exhibited varying degrees of improvement in AUC values,with adaptive synthetic sampling(ADASYN)increasing the AUC value of the logistic classification model by 2.13%,SMOTE-ENN by 3.05%,and SMOTE-Tomek by 2.13%,respectively.Conclusions ADASYN,SMOTE-ENN,and SMOTE-Tomek can better deal with the imbalanced blood glucose control data in diabetic patients and improve the performance of blood glucose control classification models.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.44