机器学习结合分子SMILES特征预测油品碳正离子生成焓性质  

Machine learning combined with molecular SMILES-based features to predict the heat of formation of carbocations in oil processing

在线阅读下载全文

作  者:詹志文 杨涛 刘旭红[1,2] 周余伟[3,4] 周利平 ZHAN Zhiwen;YANG Tao;LIU Xuhong;ZHOU Yuwei;ZHOU Liping(Computer School,Beijing Information Science and Technology University,Beijing 102206,China;Beijing Advanced Innovation Center for Materials Genome Engineering,Beijing Information Science and Technology University,Beijing 102206,China;Institute of Coal Chemistry,Chinese Academy of Sciences,Taiyuan 030001,China;Synfuels China Technology Co.,Ltd.,Beijing 101407,China)

机构地区:[1]北京信息科技大学计算机学院,北京102206 [2]北京材料基因工程高精尖创新中心(北京信息科技大学),北京102206 [3]中国科学院山西煤炭化学研究所,山西太原030001 [4]中科合成油技术股份有限公司,北京101407

出  处:《燃料化学学报(中英文)》2025年第4期613-624,共12页Journal of Fuel Chemistry and Technology

基  金:国家自然科学基金(22272009)资助。

摘  要:碳正离子是油品加工等多种化学反应中的关键中间体,其热力学生成焓性质在计算反应焓变、反应能垒、反应速率常数和理解反应机理等方面至关重要。实验上制备和获取碳正离子非常困难,无法直接测量其生成焓性质。目前,主流的做法是通过基团加和法估算,或者通过高精度量子化学方法计算。前者是经验性估算方法,计算快速但误差较大,后者通常基于第一性原理方法,计算精准但计算量巨大。本研究提出一种利用机器学习结合分子SMILES特征进行碳正离子生成焓预测的新方法,实现了以较低计算成本快速准确地预测油品碳正离子的生成焓性质。构建了156个油品烃类碳正离子数据集,利用SMILES特征提取体系信息,考察并利用多个机器学习方法进行模型的训练和构建,获得一个基于支持向量机回归的预测模型,其在训练集和测试集上的决定系数R^(2)达到了0.957和0.966,同时预测的平均绝对误差MAE为2.21 kcal/mol。机器学习结合SMILES特征不仅为油品碳正离子的生成焓预测提供了一种高效实用的策略,还为相关反应过程的热力学和动力学研究开辟了新途径。Carbocations are key intermediates in a wide range of chemical reactions associated with oil processing.The thermodynamic property of heat of formation(HOF)is essential for calculating and estimating reaction enthalpy changes(ΔH),reaction energy barriers(Ea),reaction rate constants(k),and understanding reaction mechanisms.Nevertheless,direct experimental measurements of the HOF of carbocations are challenging due to the difficulties in preparing and characterizing these reactive intermediates.The most common approaches to estimate the HOF are group additivity(GA)/group contribution(GC)-based methods or high-precision quantum chemical(QC)methods.The GA method is an empirical estimation technique that offers rapid calculations but with significant errors,while the QC method,usually based on first-principle calculations,provides accurate results at the expense of substantial computational costs.In this study,we introduce a novel approach for predicting the HOF of carbocations by integrating machine learning(ML)with molecular SMILES-based features.This method combines the strengths of ML algorithms and the rich information contained in SMILES strings,which are a standard way to represent chemical structures.By using the power of ML,we aim to achieve rapid and accurate predictions of the HOF with minimal computational cost.A total of 156 oil-related hydrocarbon carbocations were constructed,and the chemical information was extracted and represented with the SMILES-based features.Several classical ML algorithms were then employed for model training and construction,ultimately resulting in a support vector machine-based regression(SVR)model.The SVR model exhibited a high degree of accuracy,with an R^(2) of 0.957 and 0.966 on the training and testing sets,respectively.The precise results suggest that the model effectively captures the relationship between the molecular features and the HOF of carbocations.Additionally,the mean absolute error(MAE)of prediction was reduced to 2.21 kcal/mol,demonstrating the robustness and

关 键 词:碳正离子 生成焓 机器学习 SMILES特征 

分 类 号:O64[理学—物理化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象