基于K最近邻与K均值聚类法的样本分类方法对苯酚类化合物的定量结构毒性相关研究  被引量:3

QSTR study on the toxicity of phenol compounds based on the classification of the samples by KNN and K means clustering algorithm

在线阅读下载全文

作  者:张雅雄[1] 杨彩蓉 李琴[1] 

机构地区:[1]山西师范大学化学与材料科学学院,山西临汾041000

出  处:《计算机与应用化学》2016年第3期359-361,共3页Computers and Applied Chemistry

基  金:山西省留学回国人员项目(2014-045);山西省自然科学基金项目(2010011013-2);山西师大教改项目(SD2013JGXM-54)

摘  要:选取了258个苯酚类化合物的生物毒性数据,通过软件ADMEWORKS Model Builder的计算,选出7个结构描述符作为样本的结构参数,用稳健诊断方法剔除24个奇异样本,分别采用K最近邻方法和K均值聚类方法对剩余的234个样本数据进行分类,对分好的每一个类分别随机选择外部测试集,并用球型排除算法划分训练集和内部测试集,然后运用多元线性回归(Multiple Linear Regression,MLR)、偏最小二乘(Partial Least Squares,PLS)和人工神经网络(Artificial Neural Networks,ANN)方法进行预测模型的建立,计算结果表明,非线性模型的预测结果优于线性模型,有管理的分类方法(K nearest neighbors method,KNN)的预测结果优于无管理的分类方法(K均值聚类法)。258 toxicity data of phenol derivatives to aquatic Tetrahymena pyriformis were selected form literature in this paper. Then, seven molecular descriptors were calculated and selected by the ADMEWORKS ModelBuilder software. The robust diagnostic method was used to eliminate the 24 outliers. Two classification methods, the K nearest neighbor and the K-means clustering algorithm, were applied to classify the remaining 234 samples into different categories. In each category, independent external test samples were selected randomly, and then, sphere exclusion algorithm was applied to split the other samples into training and internal test samples. Subsequently, QSTR models were built applying multiple linear regression (MLR), partial least squares (PLS) and artificial neural network (ANN), respectively. Comparing the three kinds of modeling tools and the two classification methods, nonlinear modeling tools can give much better QSTR results than those of linear ones, and the unsupervised pattern recognition method (K-means clustering algorithm) is less suitable for the QSTR research in this paper than the supervised one (KNN method).

关 键 词:苯酚类化合物 K最近邻方法 K均值聚类法 定量结构毒性相关 

分 类 号:TQ015.9[化学工程] TP391.9[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象