机构地区:[1]中国人民解放军南部战区空军医院普通外科,广东广州510000 [2]中国人民解放军空军军医大学西京医院神经内科,陕西西安710000 [3]中国人民解放军空军西安飞行学院一旅明港场站医院门诊部,河南信阳463200
出 处:《中国普通外科杂志》2022年第10期1355-1362,共8页China Journal of General Surgery
基 金:国家自然科学基金资助项目(82100680)。
摘 要:背景与目的:结直肠癌(CRC)是全球第三大最常诊断的恶性肿瘤和第二大癌症死亡原因。最新指南推荐所有的CRC患者需要进行微卫星不稳定(MSI)的检测。MSI患者往往具有错配修复蛋白缺失(dMMR)。MSI/dMMR状态已被用作生物标志物预测对免疫治疗的有利反应和预后。然而MSI特征基因及其与肿瘤浸润的免疫细胞的关系未进行阐述。因此本研究通过使用机器学习的方式发掘CRC中新型的MSI特征基因,并且验证其的诊断价值及其与免疫细胞浸润的关系。方法:按照纳入排除标准,将GEO数据库中GSE39582数据集作为训练集,将TCGA数据库中COAD数据集作为外部验证集。使用机器学习的方法(LASSO回归、SVM-RFE算法),在GSE39582结直肠癌数据集中筛选MSI特征基因,并在TCGA结直肠癌数据中进行验证。采用受试者工作特征(ROC)曲线和曲线下面积(AUC)评价基因对MSI的诊断效能。CIBERSORT算法评估肿瘤样本浸润的免疫细胞成分,Spearman相关性分析验证MSI特征基因和免疫细胞的关系。结果:训练集共纳入536例CRC患者,其中高频MSI(MSI-H)77例(14.37%)。在验证集中,共计389例CRC患者,其中MSI-H 67例(17.22%)。基线资料分析显示,MSI-H/dMMR CRC的TNM分期存活率优于低频MSI(MSI-L)或微卫星稳定(MSS)/错配蛋白完整(pMMR)CRC(P<0.05)。在GSE39582数据集中,LASSO回归筛选MSI特征基因21个,SVM-RFE算法筛选基因6个,结合两种算法确定MSI特征基因为EIF5A、CXCL13、HNRNPL、HOXC6、RPL22L1、Y16709。在TCGA数据库中进一步验证MSI特征基因的诊断效能,研究发现EIF5A的诊断效能最高。在训练集和验证集中,EIF5A的AUC值分别为0.922和0.805。同时,Spearman相关性分析发现,EIF5A主要与CD8^(+)T细胞,活化的树突状细胞,辅助性T细胞,M1型巨噬细胞,γδT细胞,中性粒细胞成正相关;与CD4^(+)记忆性T细胞,M2型巨噬细胞,静止树突状细胞,嗜酸性粒细胞,调节性T细胞呈负相关。结论:CRC的Background and Aims: Colorectal cancer(CRC) is the third most commonly diagnosed malignancy and the second leading cause of cancer death worldwide. The latest guidelines recommend that all CRC patients need to be tested for microsatellite instability(MSI). MSI patients often have deficient mismatch repair(dMMR). The MSI/dMMR has been used as a biomarker for predicting the favorable response to immunotherapy and prognosis of patients. However, MSI signature genes and their relationship to tumorinfiltrating immune cells have not been fully described. Therefore, this study was conducted to discover novel MSI signature genes in CRC through machine learning and verify their diagnostic values and relationships with immune cell infiltration.Methods: According to the inclusion and exclusion criteria, the GSE39582 dataset in GEO database was used as the training set, and the COAD dataset in TCGA database was used as the external validation set. Using machine learning methods(LASSO regression and SVM-RFE algorithm), MSI signature genes were screened in the GSE39582 CRC data set and validated in the TCGA COAD dataset. Receiver operating characteristic(ROC) curve and area under the curve(AUC) were used to evaluate the diagnostic performance of genes for MSI. The CIBERSORT algorithm evaluated each sample’s immune infiltrating cell components, and Spearman correlation analysis was used to verify the relationship between MSI signature genes and immune cells.Results: A total of 536 CRC patients were included in training set, of which 77 cases(for 14.37%) were high microsatellite instability(MSI-H). In validation set, there were a total of 389 CRC patients, of which 67 cases(17.22%) were MSI-H. The baseline data analysis showed that the TNM profiles and survival rates in MSI-H/dMMR CRC were superior to those in low microsatellite instability(MSI-L) or microsatellite stable(MSS)/proficient mismatch repair(pMMR) CRC(P<0.05). In GSE39582 dataset, 21 MSI signature genes were screened by LASSO regression, and 6 genes were screened b
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...