机器学习驱动的冠心病风险评估:1999至2018年NHANES数据分析  

Machine learning-driven risk assessment of coronary heart disease:Analysis of NHANES data from 1999 to 2018

在线阅读下载全文

作  者:卢金 胡豪畅 修佳明 杨艳芳 朱齐丰[1,2,5,6] 戴晗怡 刘先宝 王建安[1,2,5,6] LU Jin;HU Haochang;XIU Jiaming;YANG Yanfang;ZHU Qifeng;DAI Hanyi;LIU Xianbao;WANG Jian’an(Department of Cardiology,Second Affiliated Hospital,Zhejiang University School of Medicine,Hangzhou310009;State Key Laboratory of Transvascular Implantation Devices,Hangzhou 310009;Department ofCardiology,Longyan First Affiliated Hospital of Fujian Medical University,Longyan Fujian 364000;Department of Cardiology,Provincial Clinical Medical College of Fujian Medical University,Fujian ProvincialHospital,Fuzhou 350001;Key Laboratory of Cardiovascular Disease Diagnosis and Treatment ofZhejiang Province,Hangzhou 310009;Binjiang Institute of Zhejiang University,Hangzhou 310053,China)

机构地区:[1]浙江大学医学院附属第二医院心血管内科,杭州310009 [2]经血管植入器械全国重点实验室,杭州310009 [3]福建医科大学附属龙岩第一医院心血管内科,福建龙岩364000 [4]福建医科大学省立临床医学院福建省立医院心血管内科,福州350001 [5]浙江省心血管病诊治重点实验室,杭州310009 [6]浙江大学滨江研究院,杭州310053

出  处:《中南大学学报(医学版)》2024年第8期1175-1186,共12页Journal of Central South University :Medical Science

基  金:国家自然科学基金(82271606,81770252)。

摘  要:目的:全球冠心病(coronary artery heart disease,CHD)发病率居高不下,给公共卫生系统带来了极大的负担和挑战。有效预防和早期诊断CHD成为减轻这一负担的关键策略。本研究致力于探索运用先进的机器学习技术来提高CHD早期筛查和风险评估的准确性。方法:纳入美国国家卫生和营养调查(National Health and Nutrition Examination Survey,NHANES)数据库1999至2018年49490名研究对象,将数据集按7꞉3划分为训练集和测试集。以研究对象是否被告知患有CHD为因变量(输出变量),并以此为依据分为CHD组和非CHD组。通过查阅CHD相关危险因素的文献,最终纳入68个自变量。分析研究对象的变量特征,并比较其在CHD组与非CHD组之间差异。采用机器学习算法随机森林(randomForest_4.7-1.1)和XGBoost(xgboost_1.7.7.1)进行变量选择。综合分析这2种算法识别出的重要性排名前10的变量,选取这2个算法共同认定的变量。使用广义线性模型来分析变量与CHD之间的关系,采用经典的逻辑回归构建CHD风险预测模型。使用受试者操作特征(receiver operating characteristic,ROC)曲线下面积(area under curve,AUC)评估模型在区分CHD和非CHD个体方面的能力;采用Hosmer-Lemeshow拟合优度检验进行校准测量,评估预测值与实际CHD比例之间的一致性;应用决策曲线评估模型风险预测的临床益处;采用诺谟图直观展示最终模型风险评分。结果:总人群的年龄为(49.53±18.31)岁,男性占51.8%。与非CHD组相比,CHD组患者的年龄较大[(69.05±11.32)岁vs(48.67±18.07)岁,P<0.001],女性比例更高(67.1%vs 47.4%,P<0.001),且在体重指数、收缩压、舒张压和吸烟等经典心血管危险因素上的差异均有统计学意义(均P<0.001)。此外,CHD组与非CHD组在能量摄入量、维生素E、维生素K、钙、磷、镁、锌、铜、钠、钾、硒等非经典心血管影响因素上的差异也均有统计学意义(均P<0.05)。最终确定了6个与CObjective:The high incidence of coronary artery heart disease(CHD)poses a significant burden and challenge to public health systems globally.Effective prevention and early diagnosis of CHD have become key strategies to alleviate this burden.This study aims to explore the application of advanced machine learning techniques to enhance the accuracy of early screening and risk assessment for CHD.Methods:A total of 49490 study subjects from the National Health and Nutrition Examination Survey(NHANES)database spanning from 1999 to 2018 were included.The dataset was randomly divided into training(70%)and testing(30%)sets.The dependent variable(outcome variable)was whether the subjects were informed of a CHD diagnosis,categorizing them into a CHD group and a non-CHD group.We reviewed the literature on risk factors associated with CHD,ultimately including 68 independent variables.The variable characteristics of the study subjects were analyzed,comparing differences between the CHD and non-CHD groups.Machine learning algorithms,specifically random forest(randomForest_4.7-1.1)and XGBoost(xgboost_1.7.7.1)were utilized for variable selection.A comprehensive analysis of the top 10 variables identified by these 2 algorithms were conducted,selecting those mutually recognized by both.A generalized linear model was used to analyze the relationships between variables and CHD,and classical logistic regression was used to construct the CHD risk prediction model.The model’s ability to distinguish between CHD and non-CHD individuals was assessed using the area under the receiver operating characteristic curve(AUC);calibration measurements were conducted with the Hosmer-Lemeshow goodness-of-fit test to evaluate the consistency between predicted values and actual CHD proportions;and decision curve analysis was applied to evaluate the clinical benefits of the model’s risk prediction.Finally,a nomogram was constructed to visually present the risk scoring of the final model.Results:The mean age of the overall population was(49.53±18.31

关 键 词:冠心病 机器学习 美国国家卫生和营养调查 风险评估 危险因素 

分 类 号:R541.4[医药卫生—心血管疾病] TP181[医药卫生—内科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象