检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张瑞敏 王科科[2] 李金波 陈转转 杨海澜[3] 邬惟为 冯永亮 王素萍 张新日 ZHANG Ruimin;WANG Keke;LI Jinbo;CHEN Zhuanzhuan;YANG Hailan;WU Weiwei;FENG Yongliang;WANG Suping;ZHANG Xinri(Department of Epidemiology,School of Public Health,Shanxi Medical University,Taiyuan 030001,China;Health Management Center,the First Hospital of Shanxi Medical University,Taiyuan 030001,China;Obstetrics and Gynecology,the First Hospital of Shanxi Medical University,Taiyuan 030001,China;Pulmonary and Critical Care Medicine,the First Hospital of Shanxi Medical University,Taiyuan 030001,China)
机构地区:[1]山西医科大学公共卫生学院流行病学教研室,太原030001 [2]山西医科大学第一医院健康管理中心,太原030001 [3]山西医科大学第一医院妇产科,太原030001 [4]山西医科大学第一医院呼吸与危重症医学科,太原030001
出 处:《中华疾病控制杂志》2023年第8期922-927,962,共7页Chinese Journal of Disease Control & Prevention
基 金:山西省基础研究计划青年科学研究项目(20210302124581);山西省百人计划项目。
摘 要:目的评价极端梯度提升(extreme gradient boosting,XGBoost)、支持向量机(support vector machine,SVM)和朴素贝叶斯等6种机器学习模型与传统logistic回归分析模型对小于胎龄儿(small for gestational age,SGA)的预测效能。方法选取2012年3月―2016年9月在山西医科大学第一医院产科住院分娩的9972例孕妇作为研究对象,采用问卷调查及从医院信息系统收集数据。依据分娩结局分为SGA组(n=1124)与非SGA组(n=8848),按7.50∶2.50比例划分训练集与测试集。采用多因素logistic回归模型筛选危险因素,基于XGBoost、SVM、朴素贝叶斯、梯度提升决策树(gradient boosting decision tree,GBDT)、K最近邻(k⁃nearest neighbor,KNN)算法及传统logistic回归分析模型方法分别建立预测模型,使用受试者工作特征曲线的曲线下面积(area under the curve,AUC)、准确率和精确度等指标比较预测性能。结果Logistic回归模型结果显示,妊娠期高血压和子痫等7项变量是SGA的影响因素。将以上因素纳入预测模型,SVM算法构建的预测模型效能最佳,AUC达0.72,模型准确率为71%。传统logistic回归分析模型表现欠佳,AUC为0.71,准确率为66%。结论基于机器学习算法尤其是SVM算法建立的SGA风险预测模型具有较好的效能,能够有效预测山西省SGA的发生,为实现SGA的一级预防提供参考。Objective To evaluate the performance of risk prediction of five machine learning models and traditional logistic regression models,such as,extreme gradient boosting(XGBoost),support vector machine(SVM),and Naive Bayes,aimed at small for gestational age(SGA).Methods A total of 9972 women who gave birth in the First Hospital of Shanxi Medical University from March 2012 to September 2016 were selected as the research subjects in this study.Their data was collected from the hospital information system and through questionnaire surveys.Based on delivery outcomes,each case was put into one of two categories:an SGA group(n=1124)and a non⁃SGA group(n=8848),with the trial set and test set according to the ratio of 7.50∶2.50.Multivariate logistic regression model were used to screen the influencing factors.To establish predictive models,XGBoost,SVM,Naive Bayes,gradient boosting decision tree(GBDT)and k⁃nearest neighbor(KNN)algorithms were used.Furthermore,their predictive performance was measured with metrics such as the area under the curve(AUC),accuracy,and precision.Results Logistic regression analysis showed that gestational hypertension and eclampsia were among the seven variables related to the occurrence of SGA.By incorporating such variables into the machine learning algorithms and traditional logistic regression,the SVM model achieved the best performance with the highest AUC of 0.72 and 71%accuracy.Comparatively,compared to the SVM model,the logistic regression⁃based model was under performing,with an AUC of 0.71 and 66%accuracy.Conclusions Machine learning models,especially SVM,are capable of more accurately evaluating the risk of the occurrence of SGA in Shanxi Province,and can provide a reference for the primary prevention of SGA.
关 键 词:小于胎龄儿 机器学习 风险预测模型 LOGISTIC回归模型
分 类 号:R173[医药卫生—妇幼卫生保健]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145