基于Transformer和门控循环单元的肽序列理论串联质谱图预测方法  

Theoretical tandem mass spectrometry prediction method for peptide sequences based on Transformer and gated recurrent unit

在线阅读下载全文

作  者:何长久 杨婧涵 周丕宇 边昕烨 吕明明 董迪 付岩[2] 王海鹏[1] HE Changjiu;YANG Jinghan;ZHOU Piyu;BIAN Xinye;LYU Mingming;DONG Di;FU Yan;WANG Haipeng(School of Computer Science and Technology,Shandong University of Technology,Zibo Shandong 255049,China;Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China)

机构地区:[1]山东理工大学计算机科学与技术学院,山东淄博255049 [2]中国科学院数学与系统科学研究院,北京100190

出  处:《计算机应用》2024年第12期3958-3964,共7页journal of Computer Applications

基  金:国家重点研发计划项目(2022YFA1304603);山东省高等学校优秀青年创新团队支持计划项目(2019KJN048)。

摘  要:针对现有理论串联质谱图预测仅限于预测b、y主干碎片离子以及单一模型难以捕捉肽序列复杂关系的问题,提出一种基于Transformer和门控循环单元(GRU)的肽序列理论串联质谱图预测方法,名为DeepCollider。首先,通过自注意力机制和长距离依赖关系,使用Transformer和GRU结合的深度学习架构增强对肽序列与碎片离子强度关系的建模能力;其次,与现有方法编码肽序列预测所有b、y主干离子不同,使用碎裂标志位标记肽序列的碎裂位点,从而可针对特定碎裂位点进行编码并预测相应的碎片离子;最后,为了计算预测谱图与实验谱图之间的相似度,使用皮尔逊相关系数(PCC)和平均绝对误差(MAE)作为评测指标。实验结果表明,与现有的仅限预测b、y主干碎片离子的方法(如pDeep和Prosit方法)相比,DeepCollider在PCC和MAE指标上均有优势,PCC值提升了0.15,MAE值降低了0.005。可见,DeepCollider不仅可以预测b、y、a主干离子及其相应的失水失氨中性丢失离子,还可以进一步提高理论谱图预测的谱峰覆盖度和相似性。Aiming at the issues in the existing prediction methods,such as only predicting b and y backbone fragment ions,as well as single models difficulty in capturing the complex relationships within peptide sequences,a theoretical tandem mass spectrometry prediction method for peptide sequences based on Transformer and Gated Recurrent Unit(GRU),named DeepCollider,was proposed.Firstly,through self-attention mechanism and long-distance dependencies,the deep learning architecture combining Transformer and GRU was used to enhance the modeling ability of relationship between peptide sequences and fragment ion intensities.Secondly,unlike the existing methods encoding peptide sequences to predict all b and y backbone ions,fragmentation flags were utilized to mark fragmentation sites within peptide sequences,thereby enabling the encoding of fragment ions at specific fragmentation sites and prediction of the corresponding fragment ions.Finally,Pearson Correlation Coefficient(PCC)and Mean Absolute Error(MAE)were employed as evaluation metrics to measure the similarity between predicted spectrometry and experimental spectrometry.Experimental results demonstrate that DeepCollider shows advantages in both PCC and MAE metrics compared to the existing methods limited to predicting b and y backbone fragment ions,such as pDeep and Prosit methods,with an increase of 0.15 in PCC value and a decrease of 0.005 in MAE value.It can be seen that DeepCollider not only predicts b,y backbone ions and their corresponding dehydrated and deaminated neutral loss ions,but also further improves the peak coverage and similarity of theoretical spectrometry prediction.

关 键 词:理论质谱图预测 肽序列 碎片离子强度 蛋白质组学 深度学习 

分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象