机构地区:[1]青岛大学附属医院神经内科,青岛266035 [2]济宁医学院附属医院重症医学科,济宁272030 [3]济宁医学院附属医院数据中心,济宁272030 [4]济宁医学院附属医院神经内科,济宁272030
出 处:《中华行为医学与脑科学杂志》2024年第6期505-512,共8页Chinese Journal of Behavioral Medicine and Brain Science
基 金:国家自然科学基金(81901228);济宁市重点研发计划软科学项目(2019SMNS002)。
摘 要:目的利用机器学习预测右向左分流(right-to-left shunt,RLS)人群隐源性卒中(cryptogenic stroke,CS)发病风险,为CS的准确和高效预测提供解决方案。方法回顾分析2018年1月至2023年9月在青岛大学附属医院崂山院区神经内科治疗的经颅多普勒超声发泡试验(c-TCD)阳性的289例RLS人群的临床数据,包括人口统计学信息、疾病史、实验室检查指标、诊断和治疗等。使用机器学习train_test_split()函数将数据集随机分为训练集和测试集,比例为8∶2。采用Logistic回归、决策树、随机森林、极端梯度提升、人工神经网络、梯度提升、极限树和自适应增强等算法构建RLS人群CS风险预测模型,使用受试者工作特征曲线(receiver operating characteristic,ROC)及曲线下面积(area under curve,AUC)、混淆矩阵、精确率、召回率、准确率、F1值、校准曲线、决策曲线等综合评估模型性能。性能最优的模型使用特征重要性和SHAP值进行可解释性分析。使用SPSS 25.0进行t检验、Mann-Whitney U检验和χ^(2)检验。采用Delong检验比较两模型间AUC的差异。结果289例RLS人群发生CS 166例(57.5%),非CS 123例(42.5%)。统计分析结果显示,CS患者D-二聚体、平均血小板体积、纤维蛋白原等血液生化指标高于非CS患者(均P<0.01);训练集与测试集各变量均差异无统计学意义(均P>0.05)。对测试集进行CS风险预测,随机森林模型取得了最高的AUC(0.885)、精确率(0.806)、召回率(0.879)、准确率(0.810)以及F1得分(0.841)。校准曲线显示随机森林模型最接近参考线,决策曲线表明随机森林模型具有更大的净受益。可解释性分析显示高风险因素包括平均血小板体积、D-二聚体、国际标准化比值、体质量指数以及年龄。结论基于随机森林的预测工具表现出色,在预测RLS人群CS风险方面准确性较高。Objective To predict the risk of cryptogenic stroke(CS)patients with right-to-left shunt(RLS)by machine learning,and provide potential solutions for accurate and efficient prediction of CS.Methods A retrospective analysis of clinical data on 289 subjects with positive RLS detected by contrast-enhanced transcranial Doppler tests(c-TCD)treated in the Department of Neurology at Laoshan Campus,the Affiliated Hospital of Qingdao University,from January 2018 to September 2023,including demographic information,medical history,laboratory test indicators,diagnosis,and treatment.The dataset was randomly divided into a training set and a testing set by the machine learning function train_test_split(),with a ratio of 8∶2.Risk prediction models for CS in RLS subjects were constructed by algorithms such as Logistic regression,decision trees,random forests,extreme gradient boosting,artificial neural networks,gradient boosting,extra trees,and adaptive Boosting.The model performance was evaluated by receiver operating characteristic curves(ROC),area under curve(AUC),confusion matrix,precision,recall,accuracy,F1 score,calibration curves,and decision curve analysis.The optimal model was subjected to interpretability analysis by feature importance and SHAP values.The t-test,Mann-Whitney U test andχ^(2) test were used for data analysis by SPSS 25.0 software.Delong test was used to compare the differences in AUC between the two models.Results In 289 RLS subjects,there were 166 cases of CS(57.5%)and 123 cases of non-CS(42.5%).The statistical analysis results showed that blood biochemical indicators such as D-dimer,mean platelet volume,and fibrinogen in CS patients were higher than those in non-CS patients(all P<0.01).There were no statistically significant differences in variables between the training and testing sets(all P>0.05).Random forest model achieved the highest AUC(0.885),precision(0.806),recall(0.879),accuracy(0.810),and F1 score(0.841)for CS risk prediction in the testing set.The calibration curve showed that the random f
关 键 词:隐源性卒中 右向左分流 机器学习 预测模型 随机森林模型
分 类 号:R743.3[医药卫生—神经病学与精神病学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...