机构地区:[1]北京大学公共卫生学院流行病与卫生统计学系,北京100191 [2]重大疾病流行病学教育部重点实验室(北京大学),北京100191 [3]内蒙古自治区疾病预防控制中心,呼和浩特010031 [4]北京大学肿瘤医院暨北京市肿瘤防治研究所乳腺癌预防治疗中心,恶性肿瘤发病机制及转化研究教育部重点实验室,北京100142 [5]北京大学肿瘤医院暨北京市肿瘤防治研究所乳腺肿瘤内科,恶性肿瘤发病机制及转化研究教育部重点实验室,北京100142 [6]北京帕云医疗科技有限公司,北京100080
出 处:《北京大学学报(医学版)》2023年第3期471-479,共9页Journal of Peking University:Health Sciences
基 金:国家自然科学基金(82173616)。
摘 要:目的:开发和验证乳腺癌患者新发心血管疾病(cardiovascular disease,CVD)的3年预测模型。方法:基于内蒙古区域医疗数据,纳入接受抗肿瘤治疗的18岁以上乳腺癌女性患者。多因素Fine&Gray模型纳入预测因子后,使用Lasso回归筛选变量,在训练集上拟合Cox比例风险、Logistic回归、Fine&Gray、随机森林和XGBoost模型,在测试集上分别用受试者工作特征(receiver operating characteristics,ROC)曲线下面积(area under the curve,AUC)和校准曲线评价模型区分度和校准度。结果:共纳入19325例接受抗肿瘤治疗的乳腺癌患者,平均年龄(52.76±10.44)岁,中位随访时间1.18年[四分位距(interquartile range,IQR):2.71]。7856例患者(40.65%)在乳腺癌诊断3年内发生CVD。Lasso回归筛选的预测因子为乳腺癌诊断年龄、居住地国内生产总值(gross domestic product,GDP)、肿瘤分期、高血压、缺血性心脏病及脑血管疾病既往史、手术类型、化疗类型、放疗类型。不考虑生存时间时,XGBoost模型的AUC显著高于随机森林模型[0.660(95%CI:0.644~0.675)vs.0.608(95%CI:0.591~0.624),P<0.001]和Logistic回归[0.609(95%CI:0.593~0.625),P<0.001],Logistic回归和XGBoost模型的校准度更好。考虑生存时间时,Cox比例风险模型和Fine&Gray模型的AUC差异无统计学意义[0.600(95%CI:0.584~0.616)vs.0.615(95%CI:0.599~0.631),P=0.188],但Fine&Gray模型的校准度更好。结论:基于区域医疗数据建立乳腺癌新发CVD的预测模型具有可行性。不考虑生存时间时,Logistic回归和XGBoost模型的预测性能更好;考虑生存时间时,Fine&Gray模型的预测性能更好。Objective:To develop and validate a three-year risk prediction model for new-onset cardiovascular diseases(CVD)among female patients with breast cancer.Methods:Based on the data from Inner Mongolia Regional Healthcare Information Platform,female breast cancer patients over 18 years old who had received anti-tumor treatments were included.The candidate predictors were selected by Lasso regression after being included according to the results of the multivariate Fine&Gray model.Cox proportional hazard model,Logistic regression model,Fine&Gray model,random forest model,and XGBoost model were trained on the training set,and the model performance was evaluated on the testing set.The discrimination was evaluated by the area under the curve(AUC)of the receiver operator characteristic curve(ROC),and the calibration was evaluated by the calibration curve.Results:A total of 19325 breast cancer patients were identified,with an average age of(52.76±10.44)years.The median follow-up was 1.18[interquartile range(IQR):2.71]years.In the study,7856 patients(40.65%)developed CVD within 3 years after the diagnosis of breast cancer.The final selected variables included age at diagnosis of breast cancer,gross domestic product(GDP)of residence,tumor stage,history of hypertension,ischemic heart disease,and cerebrovascular disease,type of surgery,type of chemotherapy and radiotherapy.In terms of model discrimination,when not considering survival time,the AUC of the XGBoost model was significantly higher than that of the random forest model[0.660(95%CI:0.644-0.675)vs.0.608(95%CI:0.591-0.624),P<0.001]and Logistic regression model[0.609(95%CI:0.593-0.625),P<0.001].The Logistic regression model and the XGBoost model showed better calibration.When considering survival time,Cox proportional hazard model and Fine&Gray model showed no significant difference for AUC[0.600(95%CI:0.584-0.616)vs.0.615(95%CI:0.599-0.631),P=0.188],but Fine&Gray model showed better calibration.Conclusion:It is feasible to develop a risk prediction model for new-onset
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...