基于协同进化信息和深度学习的蛋白质功能预测  被引量:1

Protein function prediction based on coevolutionary information and deep learning

在线阅读下载全文

作  者:王金雷 丁学明[1] 秦琪琪 彭博雅 Wang Jinlei;Ding Xueming;Qin Qiqi;Peng Boya(School of Optical-Electrical&Computer Engineering,University of Shanghai for Science&Technology,Shanghai 200093,China)

机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200093

出  处:《计算机应用研究》2023年第12期3572-3577,共6页Application Research of Computers

基  金:国家自然科学基金资助项目(11502145)。

摘  要:蛋白质的功能对于理解细胞和生物的活动机制、研究疾病机理等至关重要。面对序列数据库的快速增长,传统的实验和序列对比方法不足以支撑大规模的蛋白质功能标注。为此,提出EGNet(evolutionary graph network)模型,采用蛋白质预训练语言模型ESM2和one-hot编码得到蛋白质序列编码,通过序列自注意力和物理计算整合出残基间的协同进化信息PI(paired interaction)和SPI(strong paired interaction);之后将两种进化信息和序列编码作为多层串联图卷积网络输入,学习序列编码节点特征,实现端到端的蛋白质功能预测。与早期方法相比,在ENZYME数据库中的EC(Enzyme Commission)类别标签上,EGNet获得了更好的性能,其F-score达到0.89,AUPR值达到0.91。结果表明,EGNet仅仅采用单条序列来预测蛋白质功能就可以得到良好的结果,从而能够提供快速且有效的蛋白质功能注释。The function of protein is crucial for understanding the mechanisms of cellular and biological activities,as well as for studying the mechanisms of diseases.Traditional experimental and sequence alignment methods are insufficient to support large-scale protein functional annotation when in the face of the rapid growth of sequence databases.For this situation,this paper proposed EGNet model,which utilized the protein pre-training language model ESM2 and one-hot encoding to obtain the protein sequence encoding.The model integrated the coevolutionary information between residues,including PI and SPI,through sequence self-attention and physical calculations.Subsequently,the two types of coevolutionary information and the sequence encoding used in inputs for a multi-layered cascaded graph convolutional network to learn the node features of the sequence encoding and achieve end-to-end protein function prediction.Compared with earlier methods,EGNet achieves better performance on the EC category labels in the ENZYME database,which reaches 0.89 in the F-score and 0.91 in the AUPR.The results indicate that EGNet can achieve good performance by using only a single sequence to predict protein function,providing a rapid and effective method for protein function annotation.

关 键 词:蛋白质功能 深度学习 协同进化信息 语言模型 图卷积神经网络 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象