机构地区:[1]山东大学齐鲁医学院公共卫生学院生物统计学系,山东济南250012 [2]山东大学齐鲁医学院公共卫生学院健康医疗大数据研究院,山东济南250003 [3]山东大学齐鲁医院,山东济南250012 [4]博兴县卫生健康保障中心网络信息办,山东滨州256500 [5]山东大学数据科学研究院,山东济南250100
出 处:《山东大学学报(医学版)》2024年第11期73-84,共12页Journal of Shandong University:Health Sciences
基 金:国家自然科学基金重点项目(82330108);国家自然科学基金面上项目(82173625);山东省重点研发计划项目(2021SFGC0504);中国博士后科学基金面上资助(2022M721921);山东省自然科学基金青年基金项目(ZR2023QH236)。
摘 要:目的 依托大规模电子健康记录,结合贝叶斯网络不确定性推理的优势,构建缺血性脑卒中筛查模型。方法 筛查模型开发队列来自于齐鲁全生命周期电子研究型数据库(Cheeloo Lifespan Electronic Health Research Data-library, Cheeloo LEAD),按照7∶3比例分为训练集与测试集;外部验证队列来自国家健康医疗大数据研究院博兴合作中心数据库(博兴数据库)。采用单因素Logistic回归分析筛选与缺血性脑卒中发病显著相关的筛查因子,随后采用贝叶斯网络模型对筛查因子建模,利用禁忌搜索算法进行结构学习,利用贝叶斯估计算法进行参数学习,最终得到缺血性脑卒中筛查模型。从判别能力、校准能力两方面评价模型性能,并比较其与传统Logistic回归模型在缺血性脑卒中筛查中的表现。结果 开发队列共1 067 609例,31 019例患缺血性脑卒中;外部验证队列共386 773例,13 393例患缺血性脑卒中。经过单因素筛选得到67个筛查因子,最终构建的贝叶斯网络模型包括68个节点,440条有向边,其中缺血性脑卒中节点的父节点包括年龄、高血压病、缺血性心脏病、慢性下呼吸道疾病、其他脑血管病、发作性和阵发性疾患,累及认知、知觉、情绪状态和行为的症状和体征,训练集、测试集和外部验证队列的AUC分别为0.840(95%CI:0.838~0.843)、0.839(95%CI:0.836~0.843)和0.811(95%CI:0.808~0.814),模型的判别能力良好,并且校准能力仍旧表现较好。本研究构建的筛查模型在缺失数据下的表现仍优于传统的Logistic回归模型。结论 基于贝叶斯网络不确定性推理的优势,本研究成功构建了缺血性脑卒中筛查模型;模型具有较好的判别、校准能力,为早期缺血性脑卒中筛查提供了便捷、高效的方法。Objective To develop a screening model for ischemic stroke by relying on large-scale electronic health records and combining the advantages of Bayesian network uncertainty inference.Methods The screening model derivation cohort was derived from the Cheeloo Lifespan Electronic Health Research Data-library(Cheeloo LEAD)and divided into training and testing sets in a 7∶3 ratio.The external validation cohort was sourced from the Boxing Collaboration Center Database of the National Healthcare Big Data Research Institute(Boxing Database).The univariate Logistic regression analysis was used to screen for factors significantly associated with the ischemic stroke.These associated screening factors were used to develop the Bayesian network.The tabu search algorithm was employed for structure learning,while Bayesian estimation algorithm was used for parameter learning,ultimately leading to the development of the ischemic stroke screening model.The performance of the model was evaluated in terms of both discrimination and calibration abilities,and compared with the traditional Logistic regression model in screening for ischemic stroke.Results The derivation cohort included 1,067,609 individuals,among whom 31,019 suffered from ischemic stroke.The external validation cohort included 386,773 individuals,among whom 13,393 suffered from ischemic stroke.After the univariate screening,67 screening factors were identified.The final Bayesian network model included 68 nodes and 440 directed edges.The parent nodes of the ischemic stroke node included age,hypertensive diseases,ischemic heart diseases,chronic lower respiratory diseases,other cerebrovascular diseases,episodic and paroxysmal disorders,and the symptoms and signs involved cognition,perception,emotional state and behavior.The AUC for the training set,testing set,and external validation cohort were 0.840(95%CI:0.838-0.843),0.839(95%CI:0.836-0.843),and 0.811(95%CI:0.808-0.814),respectively,indicating good discrimination ability,and calibration ability also performed well.Our
关 键 词:电子健康记录 贝叶斯网络 LOGISTIC回归 缺血性脑卒中 筛查模型
分 类 号:R743.3[医药卫生—神经病学与精神病学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...