检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李兴海 吴志森 张利静 陶胜洋[1] Xinghai Li;Zhisen Wu;Lijing Zhang;Shengyang Tao(School of Chemistry,State Key Laboratory of Fine Chemicals,Frontier Science Center for Smart Materials,Dalian Key Laboratory of Intelligent Chemistry,Dalian University of Technology,Dalian 116024,Liaoning Province,China)
机构地区:[1]大连理工大学化学学院,精细化工国家重点实验室,智能材料化工前沿科学中心,大连市智能化学重点实验室,辽宁大连116024
出 处:《物理化学学报》2025年第2期81-89,共9页Acta Physico-Chimica Sinica
基 金:国家自然科学基金(22072011,22372025,22211530456);中央高校基本科研业务费(DUT22LAB607,DUT22QN226);中国航空研究院1912项目资助。
摘 要:机器学习(ML)在分子合成领域显示了重要的应用前景。然而,准确的机器学习预测依赖于大量实验数据,而通过传统实验方法获得成千上万的实验数据仍然是一个巨大的挑战。因此,基于小数据集得到可接受的预测模型是目前该领域亟待解决的重要问题。本研究通过构建1152个反应数据,利用大量有化学意义的特征描述符,通过多维数据分析获得了有效的预测结果,证明了基于小数据集的机器学习算法可以可靠地预测酰胺键合成反应的转化率。研究比较了6种机器学习算法的预测精度,其中随机森林表现出卓越的预测性能(R^(2)>0.95)。同时,在预测未知芳胺分子的转化率时,研究发现在训练集中加入少量未知分子的相关反应数据,即使数据集较小,也能显著提升对未知分子转化率的预测准确性,揭示了一种利用小数据集得到较好预测结果的方法。本研究为小数据集下的机器学习辅助化学合成研究提供了参考价值。不久的将来,机器学习将有力地推动有机合成化学的智能化发展。Machine learning(ML)is progressively revealing notable advantages in chemical synthesis.However,the limited output of experimental data from traditional methods poses a bottleneck,impeding the widespread adoption of machine learning.Data from literature often leads to overly optimistic predictions,and obtaining thousands of experimental data points through experiments remains a substantial challenge.Using a small dataset of experimental data,we illustrated that machine learning algorithms can reliably predict the conversion rate of amide bond synthesis.We gathered hundreds of experimental data points for 9 aromatic amines and 12 organic acids using various coupling reagents and solvents in a 96-well plate high-throughput experimental setup.Subsequently,we derived 76 feature molecular descriptors from quantum chemical calculations and utilized them as inputs for training the machine learning model.Despite the inherent limitation of low data volume,the random forest algorithm demonstrated outstanding predictive performance(R^(2)>0.95).Through comprehensive analysis of the reaction process employing importance analysis,shapley additive explanations(SHAP),and accumulated local effects(ALE)methods,we delved into the important factors influencing the reaction conversion rate.In predicting the conversion rate of unknown aromatic amine molecules,we discovered that incorporating a small amount of unknown molecule-related reaction data into the training set effectively enhances the model’s predictive performance,even with a small dataset.By comparing models trained on different molecular descriptors such as density functional theory(DFT)and one-hot encoding,we validated the efficacy of adjusting the training set to improve prediction results.This study utilized a multitude of chemically meaningful feature descriptors and achieved more effective prediction results through multidimensional data analysis,offering valuable insights for machine learning-assisted chemical synthesis research in small datasets.In the near future
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7