基于生物信息学和机器学习识别2型糖尿病线粒体特征基因  

Identification of mitochondrial feature genes for type 2 diabetes mellitus based on bioinformatics and machine learning

在线阅读下载全文

作  者:任婧 李宇婷 刘晓琴 孟祥龙 REN Jing;LI Yuting;LIU Xiaoqin;MENG Xianglong(College of Traditional Chinese Medicine and Food Engineering,Shanxi University of Chinese Medicine,Jinzhong 030619,Shanxi Province,China;College of Pharmacy,Shandong Xiandai University,Jinan 250104,China)

机构地区:[1]山西中医药大学中药与食品工程学院,山西晋中030619 [2]山东现代学院药学院,济南250104

出  处:《数理医药学杂志》2025年第4期237-247,共11页Journal of Mathematical Medicine

基  金:山西省留学人员科技活动择优资助项目(20230034);山西省回国留学人员科研资助项目(2023-156);山西省中医药管理局科研课题(2023ZYYA2012);山西省应用基础研究计划项目(20210302124694);山西中医药大学2021年度山西教育厅项目(2021L364)。

摘  要:目的基于生物信息学与机器学习技术筛选2型糖尿病(type 2 diabetes mellitus,T2DM)骨骼肌线粒体功能相关特征基因并构建诊断模型,为T2DM的临床诊断提供新思路。方法整合基因表达综合数据库(Gene Expression Omnibus,GEO)中T2DM数据集,采用R 4.3.2软件进行数据标准化和批次效应校正后,筛选差异表达基因,并与MitoCarta 3.0线粒体基因数据库取交集。通过最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)回归、随机森林(random forest,RF)和支持向量机(support vector machine,SVM)算法筛选关键基因,采用高斯混合模型(Gaussian mixture model,GMM)评估特征基因的可靠性,构建逻辑回归诊断模型。利用受试者工作特征曲线(receiver operating characteristic curve,ROC)评估模型效能,并在独立验证集中进行外部验证。结果获得23个T2DM相关线粒体差异基因,经机器学习算法筛选确定MRPS10、SLC25A5和TRNT1为核心特征基因,其显著富集于氧化磷酸化及脂肪酸代谢通路。所构建诊断模型在训练集曲线下面积(area under curve,AUC)达0.958,在GSE29221(AUC=1.000)和GSE25724(AUC=0.847)验证集中均表现优异。结论本研究鉴定的线粒体功能基因MRPS10、SLC25A5和TRNT1具有T2DM诊断潜力,不仅为T2DM诊断提供了新型候选生物标志物,更进一步解析线粒体功能障碍在T2DM发病中的分子机制奠定了基础。Objective To screen mitochondrial function-related feature genes in the skeletal muscle of type 2 diabetes mellitus(T2DM)patients and construct a diagnostic model based on bioinformatics and machine learning techniques,in order to provide new insights for the clinical diagnosis of T2DM.Methods T2DM datasets from the Gene Expression Omnibus(GEO)database were integrated and processed using R 4.3.2 software for data normalization and batch effect correction.Differentially expressed genes were screened and intersected with mitochondrial genes from the MitoCarta 3.0 database.Key genes were identified through a combination of three machine learning algorithms,including least absolute shrinkage and selection operator(LASSO)regression,random forest(RF),and support vector machine(SVM).The reliability of feature genes was evaluated using Gaussian mixture models(GMM),and a logistic regression diagnostic model was constructed.The model performance was assessed using the receiver operating characteristic curve(ROC)and externally validated in independent datasets.Results A total of 23 T2DM-related mitochondrial differential genes were obtained.MRPS10,SLC25A5,and TRNT1 were identified by machine learning algorithm as the core feature genes,which were significantly enriched in oxidative phosphorylation and fatty acid metabolism pathways.The diagnostic model achieved an area under curve(AUC)of 0.958 in the training set,with excellent performance in validation sets GSE29221(AUC=1.000)and GSE25724(AUC=0.847).Conclusion The mitochondrial function genes MRPS10,SLC25A5,and TRNT1 identified in this study demonstrated potential as diagnostic biomarkers for T2DM.These findings not only provide new candidate biomarkers for diagnosing T2DM,but also lay a foundation for in-depth analysis of the molecular mechanisms of mitochondrial dysfunction in the pathogenesis of T2DM.

关 键 词:2型糖尿病 线粒体基因 生物信息学 机器学习 

分 类 号:R587.1[医药卫生—内分泌]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象