基于TCGA和GEO数据库建立了肝内胆管癌的预后风险模型及验证分析  被引量:3

Establishment and Verification of Prognostic Risk Model of Intrahepatic Cholangiocarcinoma Based on TCGA and GEO Database

在线阅读下载全文

作  者:毛俊 沈秀芬[1] 马润[1] 何薇[1] 瞿巧莉 胡莹[1] MAO Jun;SHEN Xiu-fen;MA Run;HE Wei;QU Qiao-li;HU Ying(Department of Clinical Laboratory,the Second Affiliated Hospital of Kunming Medical University,Kunming 650101,China)

机构地区:[1]昆明医科大学第二附属医院检验科,昆明650101

出  处:《现代检验医学杂志》2023年第3期40-46,64,共8页Journal of Modern Laboratory Medicine

基  金:云南省高层次卫生健康技术人才培养支持计划(D-2018041);昆明医科大学硕士研究生创新基金(2022S253):胆管癌中CPT2表达下调提高顺铂抗性并通过ROS/NFkapaB通路促进肿瘤生长研究。

摘  要:目的基于TCGA(the cancer genome atlas)和GEO(gene expression omnibus)数据库构建肝内胆管癌(intrahepatic cholangiocarcinoma,ICCA)预后风险模型,筛选ICCA预后相关基因。方法TCGA数据库31例ICCA组织及9例癌旁组织数据作为训练集,GEO数据库30例ICCA组织及27例癌旁组织数据作为验证集,R软件“DESeq2”包过滤表达有差异的基因,过滤条件:差异倍数绝对值>2,校正P值<0.05。单因素COX回归分析筛选两组数据预后差异均有统计学意义的基因,通过LASSO回归分析构建ICCA的预后风险模型。计算训练集及验证集风险分数,并根据中值分为高、低风险组,绘制Kaplan-Meier生存曲线图和时间依赖性受试者工作特征(receiver operating characteristic,ROC)曲线。将风险分数与临床病理信息进行单、多因素COX回归分析,并绘制列线图展示,综合评价及验证模型效能。利用基因本体论(gene ontology,GO)、京都基因与基因组百科全书(Kyoto Encyclopedia of Genes and Genomes,KEGG)、基因集富集分析(Gene Set Enrichment Analysis,GSEA)和单样本基因集富集分析(Single Sample Gene Set Enrichment Analysis,ssGSEA)分析造成高低风险组预后差异的原因。结果TCGA数据共筛选出2922个差异表达基因,GEO数据共筛选出3075个(均P<0.05)。经单因素COX回归分析,TCGA筛选出68个基因(HR=0.13~7.2,均P<0.05),GEO筛选出413个基因(HR=0.17~215.1,均P<0.05),两组数据预后差异均有统计学意义的有9个基因:GOLGA7B,MTFR2,TPM2,PIWIL4,EPHX4,PRICKLE1,DIO2,FUT4和COL4A3(其中TCGA数据库HR=0.506~2.760,GEO数据库HR=0.428~1.992,均P<0.05)。LASSO回归成功构建6基因预后风险模型,模型风险分数=0.464×表达量MTFR2+0.550×表达量TPM2-0.511×表达量PIWIL4-0.097×表达量PRICKLE1+0.215×表达量DIO2-0.313×表达量COL4A3,训练集中风险分数中值为1.43。Kaplan-Meier生存分析表明在总生存率上,高风险组低于低风险组(P<0.001)。ROC曲线提示,1,3,5年AUC分别为0.971(cutoff=0.22),0.921Objective to construct a prognostic risk model of intrahepatic cholangiocarcinoma(ICCA)based on TCGA(the cancer genome atlas)and GEO(gene expression omnibus)database,and to screen ICCA prognostic related genes.Methods The data of 31 cases of ICCA tissues and 9 cases of para-carcinoma tissues in TCGA database were used as training set,and the data of 30 cases of ICCA tissues and 27 cases of para-carcinoma tissues in GEO database were used as verification set.The differentially expressed genes were filtered by R software“DESeq2”package.The filtering conditions were as follows:the absolute value of difference multiple was more than 2,and the correction P<0.05.Univariate COX regression analysis was used to screen the genes with statistically significant prognosis differences in both groups.LASSO regression analysis was used to construct the prognostic risk model of ICCA.The risk scores of training set and verification set were calculated and divided into high risk group and low risk group according to the median.Kaplan-Meier survival curve and time-dependent receiver operating characteristic(ROC)curve were drawn.The risk score and clinicopathological information were analyzed by univariate and multivariate COX regression analysis,and a line chart was drawn to comprehensively evaluate and verify the effectiveness of the model.Gene Ontology(GO),Kyoto Encyclopedia of Gene and Genomes(KEGG),Gene Set Enrichment Analysis(GSEA)and Single Sample Gene Set Enrichment Analysis(ssGSEA)were used to analyze the reasons for the difference in prognosis between high and low risk groups.Results A total of 2922 differentially expressed genes were screened by TCGA data and 3075 genes were screened by GEO data(all P<0.05).Univariate COX regression analysis showed that 68 genes were screened by TCGA(HR=0.13~7.2,all P<0.05)and 413 genes were screened by GEO(HR=0.17~215.1,all P<0.05).There were 9 genes with significant prognosis in both groups:GOLGA7B,MTFR2,TPM2,PIWIL4,EPHX4,PRICKLE1,DIO2,FUT4 and COL4A3(TCGA-HR=0.506~2.760,GEO-HR=0.428

关 键 词:肝内胆管癌 生物信息学 生存分析 风险分数 风险模型 

分 类 号:R735.8[医药卫生—肿瘤] R730.43[医药卫生—临床医学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象