Deciphering “the language of nature”: A transformer-based language model for deleterious mutations in proteins  

在线阅读下载全文

作  者:Theodore T.Jiang Li Fang Kai Wang 

机构地区:[1]Raymond G.Perelman Center for Cellular and Molecular Therapeutics,Children's Hospital of Philadelphia,Philadelphia,PA 19104,USA [2]Palisades Charter High School,Pacific Palisades,CA 90272,USA [3]Massachusetts Institute of Technology,Cambridge,MA 02139,USA [4]Department of Genetics and Biomedical Informatics,Zhongshan School of Medicine,Sun Yat-sen University,Guangzhou 510080,China [5]Department of Pathology and Laboratory Medicine,Perelman School of Medicine,University of Pennsylvania,Philadelphia,PA 19104,USA

出  处:《The Innovation》2023年第5期47-58,共12页创新(英文)

基  金:NIH grant GM132713(K.W.);CHOP Research Institute and the Fundamental Research Funds for the Central Universities,Sun Yat-sen University(No.23ptpy119,to L.F).

摘  要:Various machine-learning models,including deep neural network models,have already been developed to predict deleteriousness of missense(non-synonymous)mutations.Potential improvements to the current state of the art,however,may still benefit from a fresh look at the biological problem using more sophisticated self-adaptive machine-learning approaches.Recent advances in the field of natural language processing show that transformer models—a type of deep neural network—to be particularly powerful at modeling sequence information with context dependence.In this study,we introduce MutFormer,a transformer-based model for the prediction of deleterious missense mutations,which uses reference and mutated protein sequences from the human genome as the primary features.MutFormer takes advantage of a combination of self-attention layers and convolutional layers to learn both long-range and short-range dependencies between amino acid mutations in a protein sequence.We first pre-trained MutFormer on reference protein sequences and mutated protein sequences resulting from common genetic variants observed in human populations.We next examined different fine-tuning methods to successfully apply the model to deleteriousness prediction of missense mutations.Finally,we evaluated MutFormer’s performance on multiple testing datasets.

关 键 词:DELETE PREDICTION apply 

分 类 号:Q51[生物学—生物化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象