发展基于机器学习的芳香醚类污染物氧脱烷基化反应的预测模型  

Development of a predictive model for O-dealkylation of aromatic ether contaminants based on machine learning methods

在线阅读下载全文

作  者:程诗洋 闵浩 刘春生 季力 CHENG Shiyang;MIN Hao;LIU Chunsheng;JI Li(School of Environment and Spatial Informatics,China University of Mining and Technology,Xuzhou 221116;Key Laboratory of Pollution Exposure and Health Intervention of Zhejiang Province,Hangzhou 310015;School of Environmental Studies,China University of Geosciences,Wuhan 430074)

机构地区:[1]中国矿业大学环境与测绘学院,徐州221116 [2]浙江省污染暴露与健康干预重点实验室,杭州310015 [3]中国地质大学环境学院,武汉430074

出  处:《环境科学学报》2024年第9期366-375,共10页Acta Scientiae Circumstantiae

基  金:国家自然科学基金(No.22106168,22176211);国家自然科学基金国际合作与交流项目(No.42361134581);浙江省污染暴露与健康干预重点实验室开放基金(No.20230008);中央高校基本科研业务费专项资金资助项目(No.2023QN1042)。

摘  要:芳香醚类污染物在环境中广泛存在,具有潜在的环境健康风险.细胞色素P450酶参与的氧脱烷基化反应会影响芳香醚化合物的代谢转化安全性.然而,常规的实验和计算化学方法难以高通量筛查芳香醚新污染物发生氧脱烷基化反应的可行性.机器学习当前被广泛用于污染物的源解析和毒性筛查,但其针对有机污染物关键生物转化途径的筛查应用仍罕见报道.本研究通过数据库和文献检索,首先构建了包含390个芳香醚类新污染物的数据集,随后应用随机森林、支持向量机、K最近邻和梯度提升决策树4种机器学习方法,基于筛选出的表征反应性和结构契合性的8个分子描述符用于发展氧脱烷基反应的预测模型.其中,随机森林方法预测的准确率最高(83.3%),并且假阴性率最低(6.4%).随后生成一个利用共识策略整合多种算法的集成模型,集成模型的预测性能总体优于任何单一算法,准确率为84.6%,假阴性率为6.4%.因此,该工作可为高通量筛查芳香醚污染物的氧脱烷基化代谢路径提供方法学支撑.Aromatic ethers are ubiquitous environmental pollutants,which pose potential environmental health risks.Especially,the O-dealkylation of aromatic ethers mediated by cytochrome P450 enzymes(P450),could affect their metabolic transformation safety.However,conventional experimental and computational chemistry methods are difficult to conduct high-throughput screening of O-dealkylation of emerging aromatic ether contaminants.Alternatively,machine learning has been widely used for source apportionment and toxicity screening of pollutants already,but its application in screening key biotransformation pathways of organic pollutants is still rarely reported.Through database and literature search,this study first constructed a big data set involving 390 emerging aromatic ether pollutants.And then four machine learning methods,random forest,support vector machine,K nearest neighbor and gradient boosting decision applied tree,were applied to develop binary classification models of O-dealkylation reactions,based on the carefully screened eight molecular descriptors for describing reactivity and structural fit.Among the predictive models,the random forest shows the highest prediction accuracy(83.3%)and the lowest false negative rate(6.4%).Then an ensemble model was generated that uses a consensus strategy to integrate three different algorithms,whose performance is generally better than any single algorithm,with an accuracy rate of 84.6%and a false positive rate of 6.4%.Therefore,the classification model developed in this work can provide methodological support for high-throughput screening of O-dealkylation of aromatic ether pollutants.

关 键 词:机器学习 生物转化 细胞色素P450酶 芳香醚污染物 氧脱烷基化反应 二元分类 

分 类 号:X132[环境科学与工程—环境科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象