Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning

作　　者：Xiaodan Yin Chang-Yu Hsieh Xiaorui Wang Zhenxing Wu Qing Ye Honglei Bao Yafeng Deng Hongming Chen Pei Luo Huanxiang Liu Tingjun Hou Xiaojun Yao

机构地区：[1]Dr.Neher’s Biophysics Laboratory for Innovative Drug Discovery,State Key Laboratory of Quality Research in Chinese Medicine,Macao Institute for Applied Research in Medicine and Health,Macao University of Science and Technology,Macao 999078,China [2]Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University,College of Pharmaceutical Sciences,Zhejiang University,Hangzhou 310058,China. [3]Faculty of Applied Sciences,Macao Polytechnic University,Macao 999078,China. [4]CarbonSilicon AI Technology Co.Ltd,Hangzhou,Zhejiang 310018,China [5]Center of Chemistry and Chemical Biology,Guangzhou Regenerative Medicine and Health Guangdong Laboratory,Guangzhou 510530,China.

出　　处：《Research》2024年第3期685-702,共18页研究（英文）

基　　金：the Science and Technology Development Fund,Macao SAR(file nos.0056/2020/AMJ,0114/2020/A3,and 0015/2019/AMJ);Dr.Neher’s Biophysics Laboratory for Innovative Drug Discovery(file no.002/2023/ALC).

摘　　要：Deep learning(DL)-driven efficient synthesis planning may profoundly transform the paradigm for designing novel pharmaceuticals and materials.However,the progress of many DL-assisted synthesis planning(DASP)algorithms has suffered from the lack of reliable automated pathway evaluation tools.As a critical metric for evaluating chemical reactions,accurate prediction of reaction yields helps improve the practicality of DASP algorithms in the real-world scenarios.Currently,accurately predicting yields of interesting reactions still faces numerous challenges,mainly including the absence of high-quality generic reaction yield datasets and robust generic yield predictors.To compensate for the limitations of high-throughput yield datasets,we curated a generic reaction yield dataset containing 12 reaction categories and rich reaction condition information.Subsequently,by utilizing 2 pretraining tasks based on chemical reaction masked language modeling and contrastive learning,we proposed a powerful bidirectional encoder representations from transformers(BERT)-based reaction yield predictor named Egret.It achieved comparable or even superior performance to the best previous models on 4 benchmark datasets and established state-of-the-art performance on the newly curated dataset.We found that reaction-condition-based contrastive learning enhances the model’s sensitivity to reaction conditions,and Egret is capable of capturing subtle differences between reactions involving identical reactants and products but different reaction conditions.Furthermore,we proposed a new scoring function that incorporated Egret into the evaluation of multistep synthesis routes.Test results showed that yield-incorporated scoring facilitated the prioritization of literature-supported high-yield reaction pathways for target molecules.In addition,through meta-learning strategy,we further improved the reliability of the model’s prediction for reaction types with limited data and lower data quality.Our results suggest that Egret holds the potentia

关键词：SYNTHESIS GENERIC holds

分类号：O62[理学—有机化学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索