不同极性固定相上气相色谱保留指数机器学习集成预测模型的构建  

Construction of a machine learning ensemble prediction model for gas chromatographic retention index on stationary phases with different polarities

在线阅读下载全文

作  者:王芊懿 朱永乐 李雪花[1] WANG Qianyi;ZHU Yongle;LI Xuehua(Key Laboratory of Industrial Ecology and Environmental Engineering,Ministry of Education,School of Environmental Science and Technology,Dalian University of Technology,Dalian 116024,China)

机构地区:[1]大连理工大学环境学院,工业生态与环境工程教育部重点实验室,辽宁大连116024

出  处:《色谱》2025年第4期355-362,共8页Chinese Journal of Chromatography

摘  要:保留指数是在色谱分析中用于表征化合物保留性能的指标,是用于化合物结构鉴定的重要参数。化合物在不同极性固定相上的保留指数差异,使得当前基于单一极性固定相的保留指数预测模型无法有效应用于多种极性固定相的保留指数预测。因此,本研究建立了不同极性固定相上气相色谱保留指数预测模型,从文献中收集到2499种化合物在8种类型固定相上的保留指数数据共4183条,根据McReynolds常数进一步将固定相划分为强极性、极性、中等极性、弱极性与非极性五类,耦合化合物分子结构特征与固定相极性独热编码特征作为模型输入,采用9种算法构建了机器学习预测模型。基于模型性能最优的XGBoost和LightGBM算法,采用投票回归建立集成学习模型,其训练集决定系数(R^(2))为0.99,训练集均方根误差(RMSE)为101.85,测试集R^(2)为0.97,测试集RMSE为107.44。采用Williams图表征模型的应用域,有94%以上的数据在应用域内。本研究综合固定相极性和化合物结构两类复合特征,成功开发了能够适应多种极性固定相的保留指数预测模型,克服了现有单一极性固定相模型的局限性,极大地拓宽了模型的应用范围。与个体机器学习模型相比,集成模型体现出了更好的稳健性和预测能力。模型的建立对于提高气相色谱靶标和非靶标分析的效率和准确性具有重要的科学意义和实际价值。Gas chromatography is an analytical technique that is widely used to separate and identify various compounds.The retention index(RI)plays a significant role in gas chromatography because it provides a standardized measure for characterizing the retention performance of compounds under specific conditions and is a powerful compound-identification tool,particularly when dealing with complex mixtures.Consequently,the ability to predict RI values is a meaningful objective,particularly for multipolar phases,owing to significant variations in RI across various polar stationary phases.To address this issue,we developed a model for predicting gas-chromatographic RIs on stationary phases of varying polarity by collecting 4183 pieces of retention-index data for 2499 compounds on eight types of stationary phase from the literature and databases.Stationary phases were further classified into five categories based on their the McReynolds constants,namely:strongly polar,polar,medium polar,weakly polar,and non-polar.This classification ensured that the model is capable of handling a wide range of polarities,thereby enhancing its versatility and applicability to various analytical scenarios.The predictive model was constructed by integrating two types of composite feature.The 1D and 2D molecular-structural features of the compounds were first determined;these features capture the chemical and physical properties of the compounds,including their relative molecular masses,functional groups,and topological indices.These descriptors provide a comprehensive understanding of the molecular characteristics that influence retention behavior.Stationary-phase polarity was then one-hot encoded,which converted categorical stationary-phase-polarity information into a format that can be effectively used by machine-learning algorithms.This encoding technique ensures that the model can distinguish among the effects of various polarities on the retention behavior of the compounds.Nine algorithms were used to construct predictive machine-learning

关 键 词:气相色谱保留指数 集成学习 不同极性固定相 McReynolds常数 

分 类 号:O658[理学—分析化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象