检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《应用数学进展》2024年第8期3618-3624,共7页Advances in Applied Mathematics
摘 要:本文提出了一种两阶段的方法来预测肽与T细胞受体(TCR)的特异性结合,旨在通过逐步优化预测过程来提高准确性。在第一阶段,我们采用堆叠式自动编码器对肽和TCR序列进行数值嵌入,特别是关注TCR β链的CDR3区域,这是肽识别的关键决定因素。通过Atchley因子编码,我们将氨基酸的生化特性转换为数字矩阵,并利用无监督学习捕捉序列的关键特征。实验结果表明,自动编码器能够高度忠实地重建原始序列,验证了数值嵌入的有效性。在第二阶段,我们基于第一阶段生成的数值编码,构建了一个集成学习模型来预测肽与TCR的特异性结合。该模型结合了不同内核的支持向量机(SVM)作为基学习器,并通过堆叠法集成它们的预测结果,以提高模型的泛化能力和捕捉基学习器之间的互补性。实验结果显示,集成学习模型的性能显著优于单一的SVM模型,其ROC值的提升,表明集成学习在预测肽与TCR特异性结合方面具有更高的准确性。本文的创新点在于结合了自动编码器的数值嵌入技术和集成学习的预测模型,不仅提高了预测的准确性,还为生物信息学领域的序列分析提供了新的方法论。This article proposes a two-stage approach to predict the specific binding of peptides to T cell receptors (TCRs), aiming to improve accuracy by gradually optimizing the prediction process. In the first stage, we use a stacked autoencoder to numerically embed peptides and TCR sequences, particularly focusing on the CDR3 region of the TCR β chain, which is a key determinant of peptide recognition. By encoding the Atchley factor, we transform the biochemical characteristics of amino acids into a numerical matrix and use unsupervised learning to capture key features of the sequence. The experimental results show that the autoencoder can highly faithfully reconstruct the original sequence, verifying the effectiveness of numerical embedding. In the second stage, we constructed an ensemble learning model based on the numerical encoding generated in the first stage to predict the specific binding of peptides to TCR. This model combines support vector machines (SVM) with different kernels as base learners and integrates their prediction results through stacking to improve the model’s generalization ability and capture the complementarity between base learners. The experimental results show that the performance of the ensemble learning model is significantly better than that of a single SVM model, and the improvement in its ROC value indicates that ensemble learning has higher accuracy in predicting peptide TCR specific binding. The innovation of this article lies in the combination of numerical embedding technology of autoencoders and prediction models of ensemble learning, which not only improves the accuracy of prediction, but also provides a new methodology for sequence analysis in the field of bioinformatics.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49