基于基准实验的生存数据机器学习和COX模型的区分度性能比较  

Comparison of discrimination performance between survival data machine learning and COX model based on benchmark experiment

在线阅读下载全文

作  者:马溶基 焦志刚 缪鹏程 陆贝尔 陈华玲 钱永康 陈炳为[1] MA Rong-ji;JIAO Zhi-gang;MIAO Peng-cheng;LU Bei-er;CHEN Hua-ling;QIAN Yong-kang;CHEN Bing-wei(School of Public Health,Southeast University,Nanjing,Jiangsu 210009,China)

机构地区:[1]东南大学公共卫生学院,江苏南京210009

出  处:《现代预防医学》2023年第13期2344-2348,2368,共6页Modern Preventive Medicine

摘  要:目的比较随机生存森林模型、梯度提升模型、极限梯度提升模型与Cox比例风险回归模型对生存数据的区分度性能,为生存分析方法的应用提供参考。方法基于基准实验框架,选择SEER数据库、TCGA数据库、R软件包共13个数据集,分别构建三种机器学习模型与Cox模型,以嵌套交叉验证获得Harrell’s C-index作为模型区分度性能评价指标,采用秩和检验比较模型间性能。结果各数据集的C-index主要集中在0.6-0.75之间。单数据集的结果不全相同,各模型C-index差异仅在部分数据集有意义,且没有一致结论;四种方法的性能在所有数据集、高删失率数据集、低删失率数据集等不同组数据集间的C指数差异均无统计学意义。结论在不同场景下的生存数据分析中,三种机器学习模型区分度性能与传统Cox模型相近。Objective To compare the discrimination performance of random survival forest model,gradient boosting model,extreme gradient boosting model,and Cox proportional hazard regression model on survival data,so as to provide reference for the application of survival analysis method.Methods Based on the benchmark experimental framework,thirteen data sets of SEER database,TCGA database and R software package were selected to construct three machine learning models and Cox models respectively.Harrell’s C-index was obtained by nested cross-validation as the model discrimination performance evaluation index,and the rank sum test was used to compare the performance between models.Results The C-index of each data set was mainly between 0.6 and 0.75.The results of single data sets were not the same,the C-index differences of each model were only significant in some data sets,and there was no consistent conclusion.The performance of the four methods had no significant difference in C-index among all data sets,high deletion rate data sets,low deletion rate data sets and other different groups of data sets.Conclusion In the analysis of survival data in different conditions,the discrimination performance of the three machine learning models is similar to that of the traditional Cox model.

关 键 词:基准实验 生存分析 随机生存森林 梯度提升 极限梯度提升 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] O212.1[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象