基于R语言caret包构建模型在原发性肺癌诊断中的应用  被引量:1

Application in primary lung cancer assisted diagnosis based on caret package in R

在线阅读下载全文

作  者:陆明洋[1] 刘检[1] 周怡[1] 徐斌[1] 郑晓[1] 蒋敬庭[1] Lu Mingyang;Liu Jian;Zhou Yi;Xu Bin;Zheng Xiao;Jiang Jingting(Department of Tumor Biological Diagnosis and Treatment,the Third Affiliated Hospital of Soochow University,Jiangsu Engineering Research Center For Tumor Immunotherapy,Institute of Cell Therapy,Soochow University,Changzhou 213003,China)

机构地区:[1]苏州大学附属第三医院肿瘤生物诊疗中心,江苏省肿瘤免疫治疗工程技术研究中心,苏州大学细胞治疗研究院,常州213003

出  处:《中华实验外科杂志》2019年第11期2096-2098,共3页Chinese Journal of Experimental Surgery

基  金:国家重点研发资助项目(2018YFC1313400);国家科技支撑计划资助项目(2015BAI12B12);国家自然科学基金(31729001、31570877、31570908);江苏省重点研发计划专项资金项目(BE2018645)。

摘  要:目的构建原发性肺癌诊断模型并挖掘相关血液学标志物的诊断价值。方法通过随机森林对数据进行特征选取,基于选取出的变量构建C5.0分类预测模型。利用受试者工作特征(ROC)曲线分析各个变量的诊断效能。结果采用随机森林,获取对模型贡献度排列前10的变量:癌胚抗原(CEA)、细胞角质蛋白19片段(CYFRA21-1)、碱性磷酸酶(ALP)、神经元特异性烯醇化酶(NSE)、白蛋白(ALB)、糖类抗原125(CA125)、红细胞分布宽度(RDW)、甲胎蛋白(AFP)、红细胞压积(HCT)、嗜中性粒细胞百分比(NEU%)。基于训练集数据构建GBM模型,模型预测的ROC曲线下面积为0.899,准确度为81.25%;重复1000次C5.0模型获取的平均预测准确度为80.58%。根据C5.0分类预测,贡献度排列前10的变量在良性对照与肺癌比较时,差异均有统计学意义(P<0.01)。结论原发性肺癌诊断模型的诊断效能优于单个指标,为临床提高肺癌的诊断效率提供了新的手段。Objective To establish a diagnostic model for primary lung cancer and find the diagnostic value of related hematological markers by data mining.Methods The data was selected by random forest,and the C5.0 classification prediction model was constructed based on the selected variables.The receiver operating characteristic(ROC)curve was used to analyze the diagnostic efficacy of each variable.Results We obtained the top ten variables for model contribution by random forests which were carcinoembryonic antigen(CEA),cytokeratin 19 fragment(CYFRA21-1),alkaline phosphatase(ALP),neuron-specific enolase(NSE),albumin(ALB),carbohydrate antigen 125(CA125),red blood cell distribution width(RDW),alpha-fetoprotein(AFP),hematocrit(HCT),neutrophil percentage(NEU%).Building GBM model based on training set data,the area under the ROC curve predicted by the model was 0.899 and the accuracy was 81.25%.The average prediction accuracy obtained by repeating C5.0 models for 1000 times was 80.58%.According to the C5.0 classification prediction,the top ten variables in the contribution ranking were statistically significant when compared with the lung cancer(P<0.01).Conclusion The diagnostic efficiency of the diagnostic model of primary lung cancer is better than that of a single index,which provides a new method to improve the clinical diagnostic efficiency of lung cancer.

关 键 词:原发性肺癌 诊断模型 肿瘤标志物 

分 类 号:R[医药卫生]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象