机构地区:[1]华北理工大学附属医院肿瘤放化疗科1科,河北省唐山市063000 [2]华北理工大学附属医院血液1科
出 处:《中国煤炭工业医学杂志》2024年第3期313-319,共7页Chinese Journal of Coal Industry Medicine
基 金:河北省自然科学基金(编号:20221520)。
摘 要:目的基于深度森林(gcForest)、宽度学习(BLS)及梯度提升树(GBDT)等机器学习模型,进行低增生性骨髓增生异常综合征(hypo-MDS)和再生障碍性贫血(AA)的鉴别诊断。方法回顾性收集2008年1月1日—2022年12月31日在华北理工大学附属医院血液科首诊确诊的hypo-MDS患者与AA患者的基本信息、病史和临床检查资料。通过因素分析、结合文献查阅结果和临床专家意见,确定最终进入模型的输入变量,将研究对象随机划分为70%的训练样本和30%的验证样本,分别建立hypo-MDS和AA的gcForest、BLS及GBDT鉴别诊断模型。通过灵敏度、特异度、ROC曲线、AUC、Brier分数、校准曲线及DCA曲线比较各模型的性能,选出最优的鉴别分类模型。结果通过因素分析结合文献查阅和专家咨询,确定了年龄、红细胞计数、血红蛋白含量、中性粒细胞、早幼红细胞、中幼红细胞、晚幼红细胞、成熟淋巴细胞及成熟浆细胞等9个指标为模型的输入变量。对于验证集,gcForest、BLS和GBDT鉴别诊断模型的准确率分别为76.74%、79.07%和83.92;灵敏度分别为62.16%、72.92%和87.69%;特异度分别为87.76%、86.84%和80.77%;Brier分数分别为0.147、0.143和0.119;AUC分别为0.767(95%CI:0.731~0.805)、0.785(95%CI:0.739~0.834)和0.834(95%CI:0.808~0.861),GBDT模型的AUC高于gcForest模型,差异有统计学意义(P<0.05)。GBDT模型的校准曲线相较于其它两个模型更靠近对角线,且其临床决策曲线下面积最大。结论三种模型中GBDT模型用于hypo-MDS和AA的鉴别诊断效果最佳。Objective To differentiate diagnose hypocellular myelodysplastic syndrome(hypo-MDS)and aplastic anemia(AA)based on machine learning models including Muti-Grained Cascade Forest(gcForest),Broad Learning System(BLS),and Gradient Boosting Decision Tree(GBDT).Methods The basic information,medical history and clinical examination data of hypo-MDS patients and AA patients who were first diagnosed in hematology department of North China University of Science and Technology Affiliated Hospital from January 1,2008 to December 31,2022 were retrospectively collected.The final input variables were determined based on result of factor analysis,literature review results and clinical experts'opinions.The research subjects were randomly divided into 70%of training samples and 30%of verification samples.The differential diagnosis models of gcForest,BLS,GBDT for hypo-MDS and AA were established,respectively.The performance of each model is compared by sensitivity,specificity,ROC curve,AUC,Brier score,calibration curve and DCA curve,and the optimal discriminant classification model is selected.Results Nine indicators including age,red blood cell count,hemoglobin content,neutrophils,promyelocytes,medium-sized,latesized erythrocytes,mature lymphocytes and mature plasma cells were identified as the input variables of the model based on result of factor analysis,literature review results and clinical experts'opinions.For the validation set,the accuracy rates of gcForest,BLS,and GBDT differential diagnosis models were 76.74%,79.07%and 83.92%.The sensitivities were 62.16%,72.92%and 87.69%.The specificities were 87.76%,86.84%and 80.77%.Brier scores were 0.147,0.143 and 0.119.AUC values were 0.767(95%CI:0.731~0.805),0.785(95%CI:0.739~0.834)and 0.834(95%CI:0.808~0.861).As for AUC,the value of GBDT model was higher than that of gcForest model(P<0.05).The calibration curve of GBDT model was closer to the diagonal than the other two models,and the area under clinical decision curve was the largest.Conclusion Among those three models,GBDT model
关 键 词:梯度提升树 低增生性骨髓增生异常综合征 再生障碍性贫血 鉴别诊断
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...