基于机器学习算法构建浸润性肺腺癌预后预测模型  

Construction of a prognostic prediction model for invasive lung adenocarcinoma basedon machine learning

作  者:崔严奇 杨鲸蓉 倪琳 连铎煌 叶仕新 廖毅 张锦灿 曾志勇[1,2] CUI Yanqi;YANG Jingrong;NI Lin;LIAN Duohuang;YE Shixin;LIAO Yi;ZHANG Jincan;ZENG Zhiyong(Fuzong Clinical College of Fujian Medical University,Fuzhou,350025,P.R.China;Department of Cardiothoracic Surgery,The 900th Hospital of the Joint Support Force,Fuzhou,350025,P.R.China;Department of General Surgery,The 900th Hospital of the Joint Support Force,Fuzhou,350025,P.R.China)

机构地区:[1]福建医科大学福总临床医学院,福州350025 [2]联勤保障部队第九〇〇医院心胸外科,福州350025 [3]联勤保障部队第九〇〇医院普外科,福州350025

出  处:《中国胸心血管外科临床杂志》2025年第1期80-86,共7页Chinese Journal of Clinical Thoracic and Cardiovascular Surgery

摘  要:目的 确定肺腺癌(lung adenocarcinoma,LUAD)预后生物学标志物,并据此建立LUAD预后的预测模型。方法 从UCSC数据库获取癌症基因组图谱(TCGA)LUAD基因表达量和临床病理数据,把纳入的数据进行综合生物信息学分析,包括差异表达基因(differentially expressed genes,DEGs)筛选、基因本体(GO)功能富集分析、京都基因与基因组百科全书(KEGG)分析和基因集富集分析(GSEA)。采用Cox分析和最小绝对值收缩和选择算子(least absolute shrinkage and selection operator,LASSO)回归分析构建基因组的风险评估预测模型,并采用列线图预测患者1年、2年、3年、5年和10年生存率。绘制Kaplan-Meier生存曲线、受试者工作特征(receiver operating characteristic,ROC)曲线和时间依赖性ROC曲线评价模型的预测能力。在验证组中校验模型。结果 浸润性LUAD患者不同级别病理亚型间DEGs富集分析结果显示,280个DEGs主要参与细胞色素P450相关物质代谢、自然杀伤细胞介导的免疫反应、抗原的呈递和酶活性调节等生物学过程,与肿瘤的发生、发展密切相关。构建5个基因(MELTF、MAGEA1、FGF19、DKK4、C14ORF105)组成的风险预测模型,Cox分析和LASSO回归显示,模型的ROC曲线下面积(area under the curve,AUC)值为0.675,时间依赖性ROC曲线1年、3年、5年AUC值分别为0.893、0.713、0.632,表明该预测风险模型具有良好的敏感性和特异性。在验证组中,校准曲线和一致性指数(C指数)也表明构建的列线图预测性能较好。结论 5个基因组成的预测模型可作为LUAD患者生存率实用和可靠的预测工具,这可能有助于制定个体化治疗的临床决策,为患者预后预测提供一种新方法。Objective To determine the prognostic biomarkers and new therapeutic targets of the lung adenocarcinoma(LUAD),based on which to establish a prediction model for the survival of LUAD patients.Methods An integrative analysis was conducted on gene expression and clinicopathologic data of LUAD,which were obtained from the UCSC database.Subsequently,various methods,including screening of differentially expressed genes(DEGs),Gene Ontology(GO),Kyoto Encyclopedia of Genes and Genomes(KEGG)analysis and Gene Set Enrichment Analysis(GSEA),were employed to analyze the data.Cox regression and least absolute shrinkage and selection operator(LASSO)regression were used to establish an assessment model.Based on this model,we constructed a nomogram to predict the probable survival of LUAD patients at different time points(1-year,2-year,3-year,5-year,and 10-year).Finally,we evaluated the predictive ability of our model using Kaplan-Meier survival curves, receiver operating characteristic (ROC) curves, andtime-dependent ROC curves. The validation group further verified the prognostic value of the model. Results Thedifferent-grade pathological subtypes' DEGs were mainly enriched in biological processes such as metabolism ofxenobiotics by cytochrome P450, natural killer cell-mediated cytotoxicity, antigen processing and presentation, andregulation of enzyme activity, which were closely related to tumor development. Through Cox regression and LASSOregression, we constructed a reliable prediction model consisting of a five-gene panel (MELTF, MAGEA1, FGF19, DKK4,C14ORF105). The model demonstrated excellent specificity and sensitivity in ROC curves, with an area under the curve(AUC) of 0.675. The time-dependent ROC analysis revealed AUC values of 0.893, 0.713, and 0.632 for 1-year, 3-year, and5-year survival, respectively. The advantage of the model was also verified in the validation group. Additionally, we developeda nomogram that accurately predicted survival, as demonstrated by calibration curves and C-index. Conclusion Wehave develop

关 键 词:浸润性肺腺癌 机器学习 生物标志物 风险评估模型 列线图 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] R734.2[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象