加权共表达网络分析与机器学习识别类风湿关节炎滑膜中的关键基因  被引量:1

Weighted gene co-expression network analysis and machine learning identification of key genes in rheumatoid arthritis synovium

在线阅读下载全文

作  者:武英楷 史高龙 谢宗刚[1] Wu Yingkai;Shi Gaolong;Xie Zonggang(The Second Affiliated Hospital of Soochow University,Suzhou 215000,Jiangsu Province,China;The First People’s Hospital of Ningyang County,Taian 271000,Shandong Province,China)

机构地区:[1]苏州大学附属第二医院,江苏省苏州市215000 [2]宁阳县第一人民医院,山东省泰安市271000

出  处:《中国组织工程研究》2025年第2期294-301,共8页Chinese Journal of Tissue Engineering Research

摘  要:背景:类风湿关节炎是一种全身的免疫相关性疾病,主要病理特点是关节滑膜炎性增生及关节软骨的破坏,其发病机制目前尚不明确,迫切需要发现新的具有高度敏感性和特异性的诊断标志物。目的:联合使用生物信息学技术及计算机学习算法,识别并筛选类风湿关节炎患者滑膜中的关键基因,构建类风湿关节炎预测模型并进行验证。方法:从基因表达综合数据库中下载3个包含类风湿关节炎患者滑膜的数据集(GSE77298、GSE55235、GSE55457),GSE77298和GSE55235作为训练集,GSE55457作为测试集,共纳入66个样本,其中类风湿关节炎患者滑膜样本39个,正常滑膜样本27个。应用R语言筛选训练集中的差异基因,然后使用加权共表达网络将训练集中的基因模块化,选出关键模块中的特征基因,将差异表达基因和特征基因取交集,交集基因进入下一步机器学习。采用3种机器学习方法:最小绝对值收敛和选择算子算法、支持向量机-递归特征消除和随机森林算法对交集基因进一步分析获得枢纽基因,将枢纽基因再次相交即得到类风湿关节炎滑膜中的关键基因。以关键基因为变量构建预测类风湿关节炎的列线图模型,推测患者发生类风湿关节炎的危险程度,使用受试者工作特征曲线确定类风湿关节炎预测模型及其关键基因的诊断价值。结果与结论:①通过差异分析,训练集中共筛选出差异基因730个,加权共表达网络分析得到特征基因185个,两者交集基因159个;②最小绝对值收敛和选择算子发现枢纽基因4个,支持向量机-递归特征消除发现枢纽基因11个,随机森林发现枢纽基因5个,取交集后获得关键基因2个(TNS3、SDC1);③基于2个关键基因,在训练集及测试集种构建列线图,其校准预测曲线与标准曲线贴合较好,且预测类风湿关节炎发生的临床效能良好;④上述结果证实,基于生物信息及机器学习算法获得的TBACKGROUND:Rheumatoid arthritis is a condition that affects the entire immune system in the body and is known for causing inflammatory hyperplasia in the joints and destruction of articular cartilage.The pathogenesis of rheumatoid arthritis is still unclear;therefore,there is an urgent need to discover new highly sensitive and specific diagnostic biomarkers.OBJECTIVE:To identify and screen key genes in the synovium of rheumatoid arthritis patients using bioinformatics techniques and machine learning algorithms and to construct and validate a rheumatoid arthritis prediction model.METHODS:Three datasets containing synovial tissue samples from rheumatoid arthritis patients(GSE77298,GSE55235,GSE55457)were downloaded from the Gene Expression Omnibus(GEO)database.GSE77298 and GSE55235 were used as the training set,while GSE55457 served as the test set,with a total of 66 samples,including 39 samples from rheumatoid arthritis patients and 27 normal synovial samples.Differentially expressed genes in the training set were selected using R language,and then the weighted gene co-expression network analysis was used to modularize the genes in the training set.The most relevant module was selected,and feature genes within this module were identified.Differentially expressed genes and the feature genes from the module were intersected for the subsequent machine learning analysis.Three machine learning methods,namely the least absolute shrinkage and selection operator algorithm,support vector machine with recursive feature elimination,and random forest algorithm,were employed to further analyze the intersected genes and identify the hub genes.The hub genes obtained from these three machine learning algorithms were intersected again to obtain the key genes in the synovium of rheumatoid arthritis.A predictive rheumatoid arthritis model was constructed using these key genes as variables,and the risk of developing rheumatoid arthritis in patients was inferred based on the model.The receiver operating characteristic curve was used to

关 键 词:加权基因共表达网络 机器学习算法 类风湿关节炎 关键基因 预测模型 

分 类 号:R459.9[医药卫生—治疗学] R318[医药卫生—临床医学] R684

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象