机构地区:[1]上海海洋大学海洋生态与环境学院,上海201306 [2]中国环境科学研究院环境基准与风险评估国家重点实验室,北京100012
出 处:《生态毒理学报》2022年第2期148-163,共16页Asian Journal of Ecotoxicology
基 金:中央级公益性科研院所基本科研业务专项(2019YSKY-007,2019YSKY-021)。
摘 要:内分泌干扰物(endocrine disruptor chemicals,EDCs)繁殖毒性实验的周期长、费用高,导致水生生物繁殖毒性数据相对匮乏,限制了EDCs的生态风险评估和管理。毒性数据的预测是解决上述问题的重要手段,也是生态毒理学领域研究的热点和难点之一。在综述国内外利用机器学习预测化学物质的水生生物毒性效应研究的基础上,采用支持向量机(support vector machine,SVM)模型与线性神经网络(linear neural network,LNN)模型,根据定量构效关系(quantitative structure-activity relationship,QSAR)方法对黑头软口鲦(Pimephales promelas)繁殖毒性数据集构建了毒性效应二元分类预测模型,并进行了模型验证与评估。文献分析可知,在使用机器学习预测化合物水生生物毒性效应的研究中,SVM应用最广泛,其次是线性回归与神经网络等;预测急性毒性的研究要多于慢性毒性;分子描述符的筛选没有明确的理论指导,通常为经验与算法相结合,其中与辛醇-水分配系数相关的分子描述符一般具有较高的重要性。实验研究结果表明,经过筛选得到4种描述符作为模型输入变量,描述符分别与原子质量、极化率、电离势和键序有关;SVM对训练集与测试集的预测准确率分别为0.91与0.88,根据受试者工作特征(receiver operating characteristic,ROC)曲线得到的训练集与测试集曲线下面积(area under curve,AUC)分别为0.93与0.88;LNN对训练集与测试集的预测准确率均为0.82,AUC分别为0.87与0.88,表明2个模型均具有良好的泛化与预测能力。SVM的结果优于LNN,表明SVM更适合小样本数据建模。本研究结果可为EDCs的生态毒理学研究及毒性数据的丰富提供重要补充,为EDCs生态风险管控提供科学参考。The time-consuming and high costs of reproductive toxicity test of endocrine disruptor chemicals(EDCs)lead to a relatively lack of reproductive toxicity data for aquatic species,which restrict the ecological risk assessment and management of EDCs.The prediction of toxicity data is one of the important methods to solve the above problems,and it is also one of the hotspots and difficulties in the field of ecotoxicology.Based on the review of related research using machine learning to predict chemicals’toxicity effects on aquatic organisms,a support vector machine(SVM)and a linear neural network(LNN)coupled with quantitative structure-activity relationship(QSAR)were used respectively,to build binary classification models to predict reproduction toxicity for Pimephales promelas,and the models were validated and evaluated using the reproduction toxicity dataset.The results of re-view showed that SVM was the most widely used model to predict the toxicity effects of compounds on aquatic or-ganisms,followed by linear regression and neural network.Acute toxicity has been studied more than chronic tox-icity in application of the machine learning.There was no clear theoretical guidance for the selection of molecular descriptors subset in the field of QSAR.Generally,the combination of experiences and algorithms was applied to filtrate molecular descriptors.The descriptors related to octanol-water partition coefficient were considered to be of high importance.The experimental results are as follows:four descriptors that related to atomic mass,polarizability,ionization potential and bond order were obtained as input variables.The prediction accuracies of SVM for the training set and the test set are 0.91 and 0.88 respectively,and the area under the curve(AUC)of the training set and the test set obtained from the receiver operating characteristic(ROC)curve are 0.93 and 0.88 respectively.The accuracies of LNN for the training set and the test set are both 0.82,and the AUC are 0.87 and 0.88,respectively,indicating that LNN and
关 键 词:内分泌干扰物 黑头软口鲦 慢性毒性 机器学习 QSAR
分 类 号:X171.5[环境科学与工程—环境科学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...