机器学习方法用于建立乙酰胆碱酯酶抑制剂的分类模型  被引量:4

Classification Models for Acetylcholinesterase Inhibitors Based on Machine Learning Methods

在线阅读下载全文

作  者:杨国兵[1] 李泽荣[2] 饶含兵[2] 李象远[1] 陈宇综 

机构地区:[1]四川大学化学工程学院,成都610065 [2]四川大学化学学院,成都610064 [3]Department of Pharmacy,National University of Singapore,singapore117543

出  处:《物理化学学报》2010年第12期3351-3359,共9页Acta Physico-Chimica Sinica

基  金:国家自然科学基金(20973118)资助项目~~

摘  要:我们构建了表征乙酰胆碱酯酶抑制剂分子组成、电荷、拓扑、几何结构及物理化学性质等特征的1559个描述符,通过Fischer Score排序过滤和Monte Carlo模拟退火法相结合进行变量筛选得到37个描述符,然后分别用支持向量学习机(SVM)、人工神经网络(ANN)和k-近邻(k-NN)等机器学习方法建立了乙酰胆碱酯酶抑制剂的分类预测模型.对于训练集的515个样本,通过五重交叉验证,各机器学习方法对正样本,负样本和总样本的平均预测精度分别为87.3%-92.7%,67.0%-81.0%和79.4%-88.2%;通过y-scrambling方法验证SVM模型是否偶然相关,结果正样本,负样本和总样本的平均预测精度分别为72.7%-82.5%,41.0%-53.0%和62.1%-69.1%,明显低于实际所建模型的预测精度,表明所建模型不存在偶然相关;对172个没有参与建模的外部独立测试样本,各机器学习方法对正样本,负样本和总样本的预测精度分别为93.3%-100.0%,74.6%-89.6%和86.1%-95.9%.所建模型中,SVM模型预测精度最好,且明显高于其它文献报道结果.A total of 1559 molecular descriptors including constitutional, charge distribution, topological, geometrical, and physicochemical descriptors were calculated to encode acetylcholinesterase inhibitors. The 37 molecular descriptors were selected using a hybrid filter/wrapper approach by combining a Fischer Score and Monte Carlo simulated annealing. Classification models for the acetylcholinesterase inhibitors were then built based on support vector machine (SVM), artificial neural networks (ANN), and k nearest neighbor (k NN) methods. For the 515 samples in the training set, we obtained average prediction accuracies of 87.3%-92.7%, 67.0%-81.0%, and 79.4%-88.2% for the positive, the negative, and the total samples, respectively, by 5 fold cross validation. Average prediction accuracies of 72.7%-82.5%, 41.0%-53.0%, and 62.1%-69.1% were obtained for the positive, the negative, and the total samples, respectively, by the y scrambling method, indicating that there was no chance correlation in our models. An external test was conducted on 172 samples that were not used for model building and we obtained prediction accuracies of 93.3%-100.0%, 74.6%-89.6%, and 86.1%-95.9% for the positive, the negative, and the total samples, respectively. The prediction accuracies obtained by all the machine learning methods especially by the SVM method were far better than previously reported results.

关 键 词:乙酰胆碱酯酶抑制剂 机器学习方法 变量筛选 应用域 

分 类 号:R749.16[医药卫生—神经病学与精神病学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象