决策树模型与Logistic回归分析模型识别高血压危险因素的效果比较  被引量:24

Comparison of the effects of decision tree model and Logistic regression analysis model on identifying risk factors of hypertension

在线阅读下载全文

作  者:闫瑞平 王习亮 姚粉霞 张卫东[1] YAN Rui-ping;WANG Xi-liang;YAO Fen-xia;ZHANG Wei-dong(College of Public Health,Zhengzhou University,Zhengzhou 450000,China;Chronic Disease Department,Qingfeng County Center for Disease Control and Prevention,Puyang 457300,China)

机构地区:[1]郑州大学公共卫生学院流行病学教研室,郑州450000 [2]清丰县疾病预防控制中心慢病科,濮阳457300

出  处:《中华疾病控制杂志》2022年第2期218-222,共5页Chinese Journal of Disease Control & Prevention

摘  要:目的利用决策树模型和Logistic回归分析模型分析清丰县居民高血压的危险因素,比较两种分析方法的不同。方法采取多阶段分层整群抽样的方法,在清丰县15~74岁人群中抽取4 087名常住居民进行调查。建立决策树与Logistic回归分析模型。结果决策树和Logistic回归分析模型均显示高年龄、中心性肥胖、初中以下文化、农村、糖尿病、吸烟、饮酒和有高血压家族史为高血压危险因素。超重/肥胖变量被纳入决策树模型,在Logistic回归分析模型中被剔除,共线性诊断提示中心性肥胖和超重/肥胖两变量有较强的共线性。曲线下面积(Area Under the Curve, AUC)和综合判别改善指数(Integrated Discrimination Improvement, IDI)均提示决策树模型预测高血压效果的能力稍高于Logistic回归分析模型。结论决策树模型预测能力稍高于Logistic回归分析模型,在高血压危险因素的分析中可行、直观,同时不受变量间共线性的影响;Logistic回归分析模型可以充分展现自变量与因变量的数量依存关系,与决策树模型互为补充,可结合两者来描述高血压的危险因素。Objective To analyze the risk factors of hypertension in Qingfeng County, and to compare the differences between the two analysis methods. Methods A multi-stage stratified cluster sampling method was adopted to seket samples from 4 087 permanent residents from 15-74 years old in Qingfeng County. After the survey, decision tree and Logistic regression analysis model were established. Results Both the decision tree model and the Logistic regression analysis model showed that elder age, central obesity, educational level below junior middle school, rural areas, diabetes, smoking, drinking, and family history of hypertension were risk factors of hypertension. The overweight/obesity variable was included in the decision tree model and eliminated from the Logistic regression analysis model. The collinearity diagnosis indicated that the central obesity and overweight/obesity variables had strong collinearity. Area Under the Curve(AUC) and Integrated Discrimination Improvement(IDI) all indicated that in terms of the ability to predict the hypertension, the decision tree model was slightly higher than that of the Logistic regression analysis model. Conclusion The predictive ability of the decision tree model is slightly higher than that of the Logistic regression analysis model. The decision tree model is feasible and intuitive in the analysis of risk factors of hypertension. Besides, it is not affected by the collinearity between variables. The Logistic regression analysis model can fully demonstrate the quantitative interdependency between the independent variable and the dependent variables. It is supplementary to the decision tree model and it can be combined with the decision tree model to describe the risk factors of hypertension.

关 键 词:高血压 Logistic回归分析模型 决策树模型 危险因素 

分 类 号:R181[医药卫生—流行病学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象