机构地区:[1]首都医科大学附属北京天坛医院神经病学中心,北京100070 [2]国家神经系统疾病临床医学研究中心 [3]中国医学科学院脑血管病人工智能研究创新单元 [4]北京脑科学与类脑研究中心
出 处:《中国卒中杂志》2021年第9期895-900,共6页Chinese Journal of Stroke
基 金:北京市自然科学基金(Z200016)国家自然科学基金(92046016);“十三五”重点研发计划(2017YFC131901);北京市科委医药协同科技创新研究专项(Z201100005620010);中国医学科学院医学与健康科技创新工程项目(2019-I2M-5-029);北京市青年拔尖人才项目(2018000021223ZK03)。
摘 要:目的建立基于机器学习的缺血性卒中功能预后预测模型,为患者分层管理提供科学依据。方法选取中国国家卒中登记Ⅱ(China National Stoke RegistryⅡ,CNSRⅡ)数据库中发病7 d内的缺血性卒中患者为研究对象。logistic回归分析采用逐步回归方法筛选候选预测因子,机器学习采用Boruta算法筛选特征。使用logistic回归和CatBoost、XGBoost、LightGBM三种机器学习方法构建功能预后预测模型,并比较这四种预测模型对缺血性卒中患者3个月功能预后(mRS>2分为预后不良)的预测价值。结果本研究共纳入14885例缺血性卒中患者,平均年龄64.34±11.71岁,其中男性占63.96%(9521/14885)。患者按8∶2随机分为训练集(11908例)和测试集(2977例),两组3个月功能预后不良率分别为17.36%和17.06%(P=0.7045)。多因素分析结果显示年龄(OR 1.05,95%CI 1.04~1.05,P<0.0001)、男性(OR 0.77,95%CI 0.69~0.86,P<0.0001)、糖尿病(OR 1.16,95%CI 1.00~1.35,P=0.0497)、脑血管病史(OR 1.53,95%C I 1.37~1.70,P<0.0001)、合并肺炎(OR 2.45,95%CI 2.03~2.95,P<0.0001)、入院时NIHSS评分(OR 1.14,95%CI 1.13~1.15,P<0.0001)、发病前mRS(OR 3.11,95%CI 2.67~3.63,P<0.0001)、LDL-C(OR 1.07,95%CI 1.02~1.12,P=0.0057)、空腹血糖(OR 1.03,95%CI 1.01~1.06,P=0.0072)和白细胞计数(OR 1.07,95%CI 1.05~1.09,P<0.0001)可作为预测模型的预测因子。logistic回归、CatBoost、XGBoost、LightGBM预测模型预测缺血性卒中功能预后的AUC分别为0.815(0.801~0.829)、0.828(0.814~0.841)、0.826(0.812~0.839)和0.822(0.808~0.836)。CatBoost(P=0.0023)和XGBoost(P=0.0182)建立的预测模型预测效果均优于传统logistic回归模型。结论基于机器学习算法建立的缺血性卒中功能预后预测模型具有较高的预测价值。Objective To establish machine learning-based models for prediction of functional outcome of ischemic stroke,and to provide scientific basis for stratified management of patients.Methods The patients with ischemic stroke within 7 days of onset in the China National Stroke RegistryⅡ(CNSRⅡ)study were selected as the analyzed subjects.Predictors were screened by stepwise regression in logistic regression while by Boruta algorithm in machine learning.Then four outcome prediction models were constructed by three machine learning methods(CatBoost,XGBoost and LightGBM)and logistic regression,and the predictive value of the four models were compared.Results A total of 14885 patients of ischemic stroke were included,with a mean age of 64.34±11.71 years old and 9521 males(63.96%).The patients were randomly divided into training set(n=11908)and test set(n=2977)at a ratio of 8:2.The rate of poor functional outcome of the two sets were 17.36%and 17.06%(P=0.7045),respectively.Multivariate logistic regression analysis showed that predictors of the model were aged(OR 1.05,95%CI 1.04-1.05,P<0.0001),male(OR 0.77,95%CI 0.69-0.86,P<0.0001),history of diabetes(OR 1.16,95%CI 1.00-1.35,P=0.0497)or cerebrovascular disease(OR 1.53,95%CI 1.37-1.70,P<0.0001),complicated with pneumonia(OR 2.45,95%CI 2.03-2.95,P<0.0001),NIHSS score at admission(OR 1.14,95%CI 1.13-1.15,P<0.0001),premorbid mRS score(OR 3.11,95%CI 2.67-3.63,P<0.0001),LDL-C(OR 1.07,95%CI 1.02-1.12,P=0.0057),fasting blood glucose(OR 1.03,95%CI 1.01-1.06,P=0.0072)and white blood cell count(OR 1.07,95%CI 1.05-1.09,P<0.0001).The area under the ROC curve of CatBoost,XGBoost and LightGBM models,which was used to predict the functional outcome of ischemic stroke,were 0.828(0.814-0.841),0.826(0.812-0.839)and 0.822(0.808-0.836),respectively,while that of logistic learning regression model was 0.815(0.801-0.829).CatBoost(P=0.0023)and XGBoost(P=0.0182)models had better predictive function than logistic regression model.Conclusions The machine learning-based predictive models had high
分 类 号:R743.3[医药卫生—神经病学与精神病学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...