使用特征选择流程和机器学习分类模型预测肾细胞癌亚型  

Using Feature Selection Process and Machine Learning Classification Models to Predict Subtypes of Renal Cell Carcinoma

在线阅读下载全文

作  者:魏嘉怡 

机构地区:[1]青岛大学数学与统计学院,山东 青岛

出  处:《应用数学进展》2023年第7期3398-3413,共16页Advances in Applied Mathematics

摘  要:肾细胞癌是常见且致命的疾病,占肾癌的绝大多数。肾细胞癌是一种异质性的复杂疾病,主要由三种组织学亚型组成,存在不同的生物学和临床差异。如今,科技的发展能够得到肾细胞癌的分子亚型和生物标志物。在这项研究中,我们将三种不同的特征选择技术进行组合,即mRMR、Lasso、Boruta,利用投票法的思路从TCGA多个单组学数据集中选择最显著的特征,并将其作为基础机器学习模型的输入,用于肾细胞癌组织学亚型分类。我们评估了六种不同的分类模型,包括逻辑回归(LR)、随机森林(RF)、支持向量机(SVM)、朴素贝叶斯(NB)、k-最近邻(KNN)和XGBoost。结果表明,基于应用本文的新特征选择流程,miRNA成熟链表达RNAseq数据集提供的特征在准确性方面优于其他分类方法,在逻辑回归模型下能达到0.9779的准确率与0.9834的AUC,取得了最高性能。因此,我们改进和细化的特征选择和分类提供了诊断标志物,可能有助于提高诊断的准确性,从而帮助设计早期治疗策略,提高肾细胞癌患者的生存率。Renal cell carcinoma is a common and fatal disease, accounting for the majority of kidney cancers. There are three basic histological subtypes of renal cell carcinoma, each of which has unique biolog-ical and clinical characteristics, which represents a complex and heterogeneous ailment. The avail-ability of molecular subtypes and biomarkers for renal cell carcinoma is made possible by modern technological advancements. In this study, we combined three different feature selection tech-niques, namely mRMR, Lasso, and Boruta, using the idea of voting method to select the most signif-icant features from multiple single-omics datasets of TCGA and use them as input to a base machine learning model for histological subtype classification of renal cell carcinoma. We evaluated six clas-sification models, including logistic regression (LR), random forest (RF), support vector machine (SVM), naive Bayes (NB), k-nearest neighbor (KNN), and XGBoost. The results demonstrate that the features from the miRNA mature strand expression RNAseq dataset outperformed other classifica-tion methods based on the application of the new feature selection process in this paper, achieving the highest performance with an accuracy of 0.9779 and an AUC of 0.9834 under the logistic regres-sion model. As a result, the enhanced and refined feature selection and categorization offer diag-nostic indicators that could help increase the accuracy of diagnosis, aid in the development of early treatment plans, and enhance the survival of patients with renal cell carcinoma.

关 键 词:肾细胞癌 亚型分类 特征选择 机器学习 

分 类 号:R73[医药卫生—肿瘤]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象