Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning  

在线阅读下载全文

作  者:Xiaodan Yin Chang-Yu Hsieh Xiaorui Wang Zhenxing Wu Qing Ye Honglei Bao Yafeng Deng Hongming Chen Pei Luo Huanxiang Liu Tingjun Hou Xiaojun Yao 

机构地区:[1]Dr.Neher’s Biophysics Laboratory for Innovative Drug Discovery,State Key Laboratory of Quality Research in Chinese Medicine,Macao Institute for Applied Research in Medicine and Health,Macao University of Science and Technology,Macao 999078,China [2]Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University,College of Pharmaceutical Sciences,Zhejiang University,Hangzhou 310058,China. [3]Faculty of Applied Sciences,Macao Polytechnic University,Macao 999078,China. [4]CarbonSilicon AI Technology Co.Ltd,Hangzhou,Zhejiang 310018,China [5]Center of Chemistry and Chemical Biology,Guangzhou Regenerative Medicine and Health Guangdong Laboratory,Guangzhou 510530,China.

出  处:《Research》2024年第3期685-702,共18页研究(英文)

基  金:the Science and Technology Development Fund,Macao SAR(file nos.0056/2020/AMJ,0114/2020/A3,and 0015/2019/AMJ);Dr.Neher’s Biophysics Laboratory for Innovative Drug Discovery(file no.002/2023/ALC).

摘  要:Deep learning(DL)-driven efficient synthesis planning may profoundly transform the paradigm for designing novel pharmaceuticals and materials.However,the progress of many DL-assisted synthesis planning(DASP)algorithms has suffered from the lack of reliable automated pathway evaluation tools.As a critical metric for evaluating chemical reactions,accurate prediction of reaction yields helps improve the practicality of DASP algorithms in the real-world scenarios.Currently,accurately predicting yields of interesting reactions still faces numerous challenges,mainly including the absence of high-quality generic reaction yield datasets and robust generic yield predictors.To compensate for the limitations of high-throughput yield datasets,we curated a generic reaction yield dataset containing 12 reaction categories and rich reaction condition information.Subsequently,by utilizing 2 pretraining tasks based on chemical reaction masked language modeling and contrastive learning,we proposed a powerful bidirectional encoder representations from transformers(BERT)-based reaction yield predictor named Egret.It achieved comparable or even superior performance to the best previous models on 4 benchmark datasets and established state-of-the-art performance on the newly curated dataset.We found that reaction-condition-based contrastive learning enhances the model’s sensitivity to reaction conditions,and Egret is capable of capturing subtle differences between reactions involving identical reactants and products but different reaction conditions.Furthermore,we proposed a new scoring function that incorporated Egret into the evaluation of multistep synthesis routes.Test results showed that yield-incorporated scoring facilitated the prioritization of literature-supported high-yield reaction pathways for target molecules.In addition,through meta-learning strategy,we further improved the reliability of the model’s prediction for reaction types with limited data and lower data quality.Our results suggest that Egret holds the potentia

关 键 词:SYNTHESIS GENERIC holds 

分 类 号:O62[理学—有机化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象