检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李淑琪 光彪 赵玉凤[2] 陈继东 马利[1] Li Shuqi;Guang Biao;Zhao Yufeng(College of information engineering,Hubei University of Chinese Medicine(430065),Wuhan)
机构地区:[1]湖北中医药大学信息工程学院,430065 [2]中国中医科学院数据中心 [3]湖北中医药大学第一临床学院 [4]湖北省中医院
出 处:《中国卫生统计》2023年第6期817-821,共5页Chinese Journal of Health Statistics
基 金:国家自然科学面上基金项目(81674101)。
摘 要:目的探讨SMOTE_ENN混合采样结合AdaBoost算法在不平衡临床数据分类模型中的预测效果。方法采用网格搜索,设置不同采样比例,结合真实数据应用ROS_RUS、SMOTE_RUS、SMOTE_Tomek、SMOTE_ENN四种混合采样方法,分别基于DT、SVM、AdaBoost三种分类算法建模并比较性能。选取Recall、F1值、AUC三个评价指标,五折交叉验证重复三次取平均值。另选取两个UCI数据集对模型进行外部验证。结果12个分类模型中,SMOTE_ENN混合采样结合AdaBoost的模型性能最优,Recall、F1值和AUC分别为0.747、0.751和0.776,且最佳采样率为50%SMOTE过采样联合70%ENN欠采样。结论SMOTE_ENN混合采样结合AdaBoost模型可有效提升HT患者不平衡数据的临床结局预测效能,且按最佳比例抽样可有效解决以往重抽样没有明确采样率的问题。经公开的UCI数据集进一步验证后,该模型可推广应用。Objective To explore the prediction effect of SMOTE_ENN mixed sampling combined with AdaBoost algorithm in unbalanced clinical data classification model.Methods Grid search was used and different sampling ratios were set.Combined with real data,four mixed sampling methods of ROS_RUS,SMOTE_RUS,SMOTE_Tomek and SMOTE_ENN were applied to build models based on DT,SVM and AdaBoost classification algorithms,respectively,and their performances were compared.Selecting Recall,F1 value,AUC three evaluation indicators,50% discount cross-validation repeated three times to take the average.Another two UCI data sets are selected to validate the model externally.Results Among the 12 classification models,the performance of SMOTE_ENN mixed sampling combined with AdaBoost was the best,the values of Recall,F1 and AUC were 0.747,0.751 and 0.776 respectively,and the best sampling rate was 50% SMOTE oversampling combined with 70% ENN undersampling.Conclusion SMOTE_ENN mixed sampling combined with AdaBoost model can effectively improve the clinical outcome prediction efficiency of unbalanced data of HT patients,and the best proportional sampling can effectively solve the problem that there is no clear sampling rate in previous resampling.After further verification of the open UCI data set,the model can be popularized and applied.
关 键 词:SMOTE ENN ADABOOST 临床预测模型 不平衡数据
分 类 号:R195.1[医药卫生—卫生统计学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.180.237