基于深度学习与领域规则建模的蛋白质信号肽及其切割位点预测  被引量:8

Predicting protein signal peptides and their cleavage sites based on deep learning and domain rule modeling

在线阅读下载全文

作  者:张维洵 潘小勇 沈红斌[1] Zhang Weixun;Pan Xiaoyong;Shen Hongbin(Institute of Image Processing and Pattern Recognition,Shanghai Jiao Tong University,Shanghai 200240,China)

机构地区:[1]上海交通大学图像处理与模式识别研究所,上海200240

出  处:《南京理工大学学报》2020年第3期278-287,共10页Journal of Nanjing University of Science and Technology

基  金:国家自然科学基金(61725302,61671288,61903248)。

摘  要:为了提升蛋白质信号肽及其切割位点预测精度,有效区分3种不同类型的信号肽,提出基于位置特异性打分矩阵(PSSM)和同源检测迭代的隐马尔科夫(HMM)文件的深度学习预测方法。设计基于自注意力机制的神经网络模型用于信号肽预测,并使用基于知识迁移的模型集成方法提升预测效果。设计基于门控循环单元(GRU)网络的条件随机场(CRF)来预测信号肽切割位点,并集成领域规则方法提升预测能力。实验结果表明,该文方法对革兰氏阴性菌和革兰氏阳性菌的Sec/SPI、Sec/SPII与Tat/SPI信号肽预测任务的平均马修斯相关系数(MCC)为0.962。该文方法对革兰氏阴性菌和革兰氏阳性菌的Sec/SPI、Sec/SPII与Tat/SPI信号肽切割位点预测任务的平均召回率和准确率分别为0.698和0.662。在部分信号肽样本上,该文方法能正确预测SignalP 5.0方法预测错误的样本,2种方法在切割位点的预测上存在着一定的互补性。In order to improve the prediction accuracy of protein signal peptides and their cleavage sites,and effectively distinguish three different types of signal peptides,a novel deep learning-based method based on the position specific scoring matrix(PSSM)and the hidden Markov model(HMM)profile of iteration of homologous detection is proposed.A neural network based on self-attention mechanism for signal peptide prediction is designed,and model integration based on knowledge transfer is used to improve the prediction performance.A conditional random field(CRF)based on a gated recurrent unit(GRU)network is designed for predicting signal peptide cleavage sites,and a domain rule-based method is integrated to improve the prediction ability.The results showed that the average Matthew’s correlation coefficients(MCC)of Sec/SPI,Sec/SPII and Tat/SPI signal peptide prediction for gram-negative bacterium and gram-positive bacterium is 0.962.The average recall rate and accuracy rate of Sec/SPI,Sec/SPII and Tat/SPI signal peptide cleavage sites predicting for gram-negative bacterium and gram-positive bacterium are 0.698 and 0.662 respectively.In some signal peptide samples,this method can correctly predict the wrong samples of SignalP 5.0 method,and the two methods are complementary in the signal peptide cleavage sites predicting.

关 键 词:深度学习 领域规则 蛋白质 信号肽 知识迁移 门控循环单元 条件随机场 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象