基于随机森林与logistic回归的高血压影响因素研究  被引量:6

Influential factors of hypertension based on random forest and logistic regression

在线阅读下载全文

作  者:王福成 齐平[2] 蒋剑军[2] 黄永[3] 杨晓玲[4] WANG Fu-cheng;QI Ping;JIANG Jian-jun;HUANG Yong;YANG Xiao-ling(School of Management,Hefei University of Technology,Hefei,Anhui 230009,China;不详)

机构地区:[1]合肥工业大学管理学院,安徽合肥230009 [2]铜陵学院服务计算研究所,安徽铜陵244000 [3]铜陵市疾病预防控制中心,安徽铜陵244000 [4]天桥社区卫生服务站,安徽铜陵244000

出  处:《现代预防医学》2020年第13期2310-2313,共4页Modern Preventive Medicine

基  金:安徽省高校优秀青年骨干人才国内外访学研修项目(gxgnfx2019041);铜陵学院校级自然科学基金项目(2019tlxy34);安徽省数据科学与大数据技术教学团队(2019jxtd105);安徽省高校自然科学基金重点资助项目(KJ2019A0704)。

摘  要:目的针对铜陵市天桥社区居民体检数据中多因素、有效样本有限的情况,挖掘与分析高血压影响因素与因素间的交互效应,为高血压干预提供参考。方法选取2017年该社区801例体检数据为研究对象,采用随机森林方法,筛选出重要性评分较大的特征,代入logistic完全二次回归模型,逐步回归分析影响因素及因素间的交互效应。结果随机森林模型准确率83.67%,特征重要性前10项为年龄、糖尿病、锻炼频率、体质指数、总胆固醇、吸烟情况、饮酒情况、中心性肥胖、甘油三酯、血尿素氨。Logistic完全二次回归模型准确率84.17%,输出2条主效应、8条二次交互效应。主效应中有统计学意义(P<0.05)的特征有年龄、锻炼频率,二次交互效应中有统计学意义(P<0.05)的特征有年龄、糖尿病、体质指数、总胆固醇、吸烟情况、饮酒情况、甘油三酯、血尿素氨。结论随机森林与logistic完全二次回归模型相结合,解决了经典方法难以从多因素、样本有限的数据中挖掘交互效应的问题,获得高血压影响因素与因素间的交互效应,为高血压干预提供有益的指导。Objective To analyze the interaction effects of hypertension influencing factors through multiple factors and limited effective samples in the physical examination data of Tianqiao Community residents in Tongling, so as to provide a reference for hypertension intervention. Methods 801 physical examination data of the community in 2017 were selectedas the research object.The random forest method was used to select the characteristics with large importance scores, and the logistic complete quadratic regression model was used to analyze the interaction effects among the influencing factors. Results The random forest model had an accuracy rate of 83.67%. The top 10 items of characteristic importance were age, diabetes, exercise frequency, body mass index, total cholesterol, smoking status, drinking status, central obesity, triglycerides, and blood urea ammonia. The logistic complete quadratic regression model had an accuracy rate of 84.17%, and 2 main effects and 8 quadratic interaction effects were outputted. The main effects with statistically significant(P<0.05) were age and exercise frequency, and the secondary interaction effects with statistically significant(P<0.05) were age, diabetes, body mass index, total cholesterol, smoking status, drinking conditions, triglycerides and blood urea ammonia. Conclusion The combination of random forest and logistic complete quadratic regression model can excavate interaction effects from multi-factor and limited sample data, which is difficult for classic methods.It can also obtain the interaction effects of hypertension influencing factors, which provides beneficialguidance for hypertension intervention.

关 键 词:高血压 随机森林 LOGISTIC回归 影响因素 交互效应 

分 类 号:R544.11[医药卫生—心血管疾病]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象