基于统计相关性的变量选择用于麻痹性贝毒素的QSAR研究  被引量:3

Correlation-based feature selection in QSAR studies of the paralytic shellfish poisoning toxins(PSP toxins)

在线阅读下载全文

作  者:刁宁[1] 张永清[1,2] 易筱筠[1,2] 

机构地区:[1]华南理工大学环境科学与工程学院,广东广州510006 [2]工业聚集区污染控制与生态修复教育部重点实验室,广东广州510006

出  处:《计算机与应用化学》2010年第6期811-815,共5页Computers and Applied Chemistry

基  金:广东省自然科学基金(05300189);广东省科技计划(2007A020100001-13)资助

摘  要:使用27种麻痹性贝毒素中的1 751种分子结构描述符和其半数致死浓度建QSAR模型,采用基于统计相关性的变量选择(Correlation-based Feature Selection,CFS)法选择变量,并使用交叉验证法检验变量子集,最后从1751种分子结构描述符中,筛选出43种与目标值关系极密但内部关系较低的变量。用主成分分析法压缩变量集的维度,提取10种主成分作为新的变量建QSAR模型。模型的相关系数R^2为0.891,交叉验证系数q^2为0.809,表明模型拟合效果和预测能力良好。用"Jackknife法"检验模型的稳定性,有88.9%的相关系数R落在0.94和0.95之间,说明模型稳健性和可靠性较强。结果,基于统计相关性的变量选择法非常适合从成百上千种变量中筛选,它在消除无关变量的同时也能消除重复变量,有利于数据的处理,在QSAR建模中应用前景广阔。1 751 descriptors and semi-lethal doses of 27 Paralytic Shellfish Poisoning toxins were used to generate QSAR model.In this process,Correlation-based Feature Selection was used to select features,with leave-one-out cross validation as performance estimator of feature set.43 descriptors were selected from the 1 751 descriptors,with high correlation with target values and low intercorrelation with each other.Principal Component Regression was used to reduce the dimension of the selected feature set.,and 10 principal components extracted were used as new features to generate QSAR model.A model with high correlation coefficients(R^2= 0.891) and high cross-validation test result(q^2 =0.801) showed that the model had high precision and good prediction capability.Jackknife method was used to test the stability of the model,and 88.9%correlation coefficients falling into between 0.94 and 0.95 showed that the model had strong robustness and reliability.The results indicated that CFS was fit for selecting features from hundreds and thousands of features,with both incorrelated and redundant features reduced.

关 键 词:基于统计相关性的变量选择 麻痹性贝毒素 定量构效 QSAR 

分 类 号:TQ015.9[化学工程] O6-39[理学—化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象