基于SVM集成学习分析TCR与肽的特异性结合  

Analysis of Specific Binding between TCR and Peptides Based on SVM Ensemble Learning

在线阅读下载全文

作  者:白宁仪 李国洁 

机构地区:[1]河南科技大学数学与统计学院,河南 洛阳

出  处:《应用数学进展》2024年第8期3618-3624,共7页Advances in Applied Mathematics

摘  要:本文提出了一种两阶段的方法来预测肽与T细胞受体(TCR)的特异性结合,旨在通过逐步优化预测过程来提高准确性。在第一阶段,我们采用堆叠式自动编码器对肽和TCR序列进行数值嵌入,特别是关注TCR β链的CDR3区域,这是肽识别的关键决定因素。通过Atchley因子编码,我们将氨基酸的生化特性转换为数字矩阵,并利用无监督学习捕捉序列的关键特征。实验结果表明,自动编码器能够高度忠实地重建原始序列,验证了数值嵌入的有效性。在第二阶段,我们基于第一阶段生成的数值编码,构建了一个集成学习模型来预测肽与TCR的特异性结合。该模型结合了不同内核的支持向量机(SVM)作为基学习器,并通过堆叠法集成它们的预测结果,以提高模型的泛化能力和捕捉基学习器之间的互补性。实验结果显示,集成学习模型的性能显著优于单一的SVM模型,其ROC值的提升,表明集成学习在预测肽与TCR特异性结合方面具有更高的准确性。本文的创新点在于结合了自动编码器的数值嵌入技术和集成学习的预测模型,不仅提高了预测的准确性,还为生物信息学领域的序列分析提供了新的方法论。This article proposes a two-stage approach to predict the specific binding of peptides to T cell receptors (TCRs), aiming to improve accuracy by gradually optimizing the prediction process. In the first stage, we use a stacked autoencoder to numerically embed peptides and TCR sequences, particularly focusing on the CDR3 region of the TCR β chain, which is a key determinant of peptide recognition. By encoding the Atchley factor, we transform the biochemical characteristics of amino acids into a numerical matrix and use unsupervised learning to capture key features of the sequence. The experimental results show that the autoencoder can highly faithfully reconstruct the original sequence, verifying the effectiveness of numerical embedding. In the second stage, we constructed an ensemble learning model based on the numerical encoding generated in the first stage to predict the specific binding of peptides to TCR. This model combines support vector machines (SVM) with different kernels as base learners and integrates their prediction results through stacking to improve the model’s generalization ability and capture the complementarity between base learners. The experimental results show that the performance of the ensemble learning model is significantly better than that of a single SVM model, and the improvement in its ROC value indicates that ensemble learning has higher accuracy in predicting peptide TCR specific binding. The innovation of this article lies in the combination of numerical embedding technology of autoencoders and prediction models of ensemble learning, which not only improves the accuracy of prediction, but also provides a new methodology for sequence analysis in the field of bioinformatics.

关 键 词:集成学习 特异性结合 SVM 自动编码器 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象