A predictive language model for SARS-CoV-2 evolution  

在线阅读下载全文

作  者:Enhao Ma Xuan Guo Mingda Hu Penghua Wang Xin Wang Congwen Wei Gong Cheng 

机构地区:[1]School of Basic Medical Science,Tsinghua University,30 Shuangqing Rd.,Haidian District,Beijing 100084,China [2]Institute of Infectious Diseases,Shenzhen Bay Laboratory,Guangqiao Rd.,Guangming District,Shenzhen,Guangdong 518000,China [3]Beijing Institute of Biotechnology,20 Dongdajie,Fengtai District,Beijing 100071,China [4]Department of Immunology,School of Medicine,University of Connecticut Health Center,Farmington,CT 06030,USA

出  处:《Signal Transduction and Targeted Therapy》2025年第1期394-410,共17页信号转导与靶向治疗(英文)

基  金:funded by grants from the National Natural Science Foundation of China(32188101,81961160737,and 31825001)to G.C.,the National Key Research and Development Plan of China(2021YFC2300200,2020YFC1200104,2021YFC2302405,2022YFC2303200,and 2022YFC2303400);Tsinghua-Foshan Innovation Special Fund(TFISF)(2022THFS6124);Shenzhen San-Ming Project for Prevention and Research on Vector-borne Diseases(SZSM201611064);Shenzhen Science and Technology Project(JSGG20191129144225464)to G.C.Shenzhen Medical Research Fund(2404002);Innovation Team Project of Yunnan Science and Technology Department(202105AE160020);the Yunnan Cheng gong expert workstation(202005AF150034)to G.C.This work is alsofinancially supported by XPLORER PRIZE from Tencent Foundation.

摘  要:Modeling and predicting mutations are critical for COVID-19 and similar pandemic preparedness.However,existing predictive models have yet to integrate the regularity and randomness of viral mutations with minimal data requirements.Here,we develop a non-demanding language model utilizing both regularity and randomness to predict candidate SARS-CoV-2 variants and mutations that might prevail.We constructed the“grammatical frameworks”of the available S1 sequences for dimension reduction and semantic representation to grasp the model’s latent regularity.The mutational profile,defined as the frequency of mutations,was introduced into the model to incorporate randomness.With this model,we successfully identified and validated several variants with significantly enhanced viral infectivity and immune evasion by wet-lab experiments.By inputting the sequence data from three different time points,we detected circulating strains or vital mutations for XBB.1.16,EG.5,JN.1,and BA.2.86 strains before their emergence.In addition,our results also predicted the previously unknown variants that may cause future epidemics.With both the data validation and experiment evidence,our study represents a fast-responding,concise,and promising language model,potentially generalizable to other viral pathogens,to forecast viral evolution and detect crucial hot mutation spots,thus warning the emerging variants that might raise public health concern.

关 键 词:REGULARITY utilizing INTEGRATE 

分 类 号:O17[理学—数学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象