基于长短时记忆循环网络和基团特征的蛋白质二级结构预测  

Protein Secondary Structure Prediction Based on Long-Short-Term Memory Recurrent Network and Radical Group Features

在线阅读下载全文

作  者:韩心怡 刘毅慧[1] 

机构地区:[1]齐鲁工业大学(山东省科学院)计算机科学与技术学院,山东 济南

出  处:《计算生物学》2020年第4期57-68,共12页Hans Journal of Computational Biology

摘  要:蛋白质二级结构预测是蛋白质结构研究领域的重要课题,随着机器学习和深度学习的发展,多种多样的预测模型被提出,实验采用双向长短时记忆循环网络模型,取消滑动窗口限制,充分考虑氨基酸长距离相互作用和氨基酸序列前后文之间的相互影响。重新设计了网络的输入特征,在PSSM基础上增加了42基团特征,使用大数据集进行训练,在公共测试集CASP9,CASP10,CASP11和CASP12上Q3准确率分别达到了85.74%,86.83%,84.73%和83.79%。实验结果表明,蛋白质二级结构预测可在新的特征设计,考虑氨基酸长距离相互作用和大数据的使用方向上进一步的研究。Protein secondary structure prediction is an important topic in the field of protein structure re-search. With the development of machine learning and deep learning, a variety of prediction mod-els have been proposed. The experiment used a bidirectional long-short-term memory recurrent network model, removed the sliding window, and fully considered the long-distance amino acid in-teraction and the interaction between the context of the amino acid sequence. Redesigned the input features of the network, added 42 radical group features on the basis of PSSM, used large data sets for training, and the accuracy of Q3 on the public test sets CASP9, CASP10, CASP11 and CASP12 reached 85.74%, 86.83%, 84.73% and 83.79% respectively. The experimental results show that protein secondary structure prediction can be further studied in the design of new features, con-sidering the long-range interaction of amino acids and the use of big data.

关 键 词:蛋白质 蛋白质二级结构预测 循环网络 基团 结构预测 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象