基于AdaBoost法在代谢综合征不平衡数据分类中的应用  被引量:2

Based on the application of AdaBoost + decision tree for metabolic syndrome with imbalanced data

在线阅读下载全文

作  者:闫慈[1] 田翔华[2] 阿拉依.阿汗 张伟文[1] 曹明芹[1] 

机构地区:[1]新疆医科大学公共卫生学院,新疆乌鲁木齐830011 [2]新疆医科大学医学工程技术学院,新疆乌鲁木齐830011

出  处:《现代预防医学》2017年第21期3850-3852,3862,共4页Modern Preventive Medicine

基  金:新疆科技支疆项目(2016E02082);国家自然科学基金(71663053)

摘  要:目的 (1)针对医疗数据不平衡的特点,以代谢综合征为例,通过比较单纯决策树与AdaBoost+决策树分类代谢综合征的性能,从而确定AdaBoost+决策树在医疗不平衡数据挖掘中的优点,为计算机辅助诊断代谢综合征提供方法学参考。(2)采用决策树探讨代谢综合征的影响因素。方法采用AdaBoost平衡代谢综合征数据,并比较数据平衡前后决策树建模的性能,采用F-value,G-mean和AUC分析评价模型。结果 (1)相较于单纯决策树,AdaBoost+决策树的F-value值提高6.3%,G-mean提高3.5%,AUC提高0.4%,分别表明采用AdaBoost+决策树分类代谢综合征患者识别的性能提高6.3%,数据整体的分类精度提高3.5%;模型的综合分类能力提高0.4%。(2)探讨决策树影响因素均显示:空腹血糖、高密度脂蛋白、收缩压、年龄、体重指数是代谢综合征的主要影响因素。此外,在本研究中,决策树提示:若FPG>6.02,BMI>24.99,SBP>139,age≤46,则患有代谢综合征;若FPG≤6.02,HDL-C≤0.99,BMI≤24.99,age≤61,则不患代谢综合征。结论采用AdaBoost+决策树的性能优于决策树,使用决策树所得结果与相关专业研究中代谢综合征影响因素相同。Objective( 1) To determine the advantages of the AdaBoost + decision tree in mining unbalanced medical data by comparing the metabolic syndrome classification performance of decision tree and AdaBoost + decision tree concerning the characteristics of unbalanced medical data. So as to provide methodology reference for computer-aided diagnosis of metabolic syndrome.( 2) To explore the influencing factors of metabolic syndrome with a decision tree. Methods The performance of the decision tree model before and after AdaBoost algorithm's balancing metabolic syndrome dataset was compared. F-value,G-mean and AUC were used to analyse and evaluate the models. Results( 1) Comparing with the decision tree,AdaBoost +decision tree's F-value increased by 6. 3%,G-mean increased by 3. 5%,and AUC increased by 0. 4%,indicating that with the AdaBoost + decision tree classification the patients with metabolic syndrome,the recognition performance increased by 6.3%,the overall classification accuracy increased by 3. 5%,and the comprehensive classification ability of the model increased by 0. 4%.( 2) Fasting plasma glucose,high-density lipoprotein,systolic blood pressure,age and body mass index were the major factors of metabolic syndrome. Moreover,decision tree showed that the metabolic syndrome tended to occur if FPG 6.02,BMI 24. 99,SBP 139 and age≤46,and the metabolic syndrome tended not to occur if FPG≤6. 02,HDL-C≤0. 99,BMI≤24. 99 and age≤61. Conclusion The performance of AdaBoost + decision tree is better than the decision tree,and the influencing factors of the metabolic syndrome in the research are similar to that of other related professional studies.

关 键 词:代谢综合征 ADABOOST 决策树 不平衡数据集 

分 类 号:R195[医药卫生—卫生统计学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象