检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]山东农业大学生命科学学院,作物生物学国家重点实验室,山东泰安271018 [2]四川大学化学学院,教育部绿色化学与技术重点实验室,成都610064 [3]四川大学生物治疗国家重点实验室,成都610041
出 处:《物理化学学报》2011年第6期1407-1416,共10页Acta Physico-Chimica Sinica
基 金:supported by the National Key Basic Research Program of China(2009CB118500);Scientific Research Foundation for theReturned Overseas Chinese Scholars,Ministry of Education,China(20071108-18-15)~~
摘 要:在丙型肝炎病毒(HCV)的基因复制和蛋白质成熟的过程中,非结构蛋白5B(NS5B)作为RNA依赖的RNA聚合酶起到了重要的作用.抑制NS5B聚合酶可以阻止丙型肝炎病毒的RNA复制,因此成为一种治疗丙型肝炎的有效方法.通过计算机方法进行虚拟筛选和预测NS5B聚合酶抑制剂已经变得越来越重要.本文主要采用机器学习方法(支持向量机(SVM)、k-最近相邻法(k-NN)和C4.5决策树(C4.5DT))对已知的丙型肝炎病毒NS5B蛋白酶抑制剂与非抑制剂建立分类预测模型.1248个结构多样性化合物(552个NS5B抑制剂与696个非NS5B抑制剂)被用于测试分类预测系统,并用递归变量消除法选择与NS5B抑制剂相关的性质描述符以提高预测精度.独立验证集的总预测精度为84.1%-85.0%,NS5B抑制剂的预测精度为81.4%-91.7%,非NS5B抑制剂的预测精度为78.2%-87.2%.其中支持向量机给出最好的NS5B抑制剂预测精度(91.7%);C4.5决策树给出最好的非NS5B抑制剂预测精度(87.2%);k-最近相邻法给出最好的总预测精度(85.0%).研究表明机器学习方法可以有效预测未知数据集中潜在的NS5B抑制剂,并有助于发现与其相关的分子描述符.Non-structural proteins 5B (NS5B) play an important role in protein maturation and gene replication as an RNA dependent RNA polymerase in the hepatitis C virus (HCV). Inhibiting NS5B polymerase will prevent RNA replication and, therefore, it is significant for the treatment of HCV. It is becoming increasingly important to screen and predict molecules that have NS5B inhibitory activity by computational methods. This work explores several machine learning (ML) methods (support vector machine (SVM), k-nearest neighbor (k-NN), and C4.5 decision tree (C4.5 DT)) for the prediction of NS5B inhibitors (NS5BIs). This prediction system was tested using 1248 compounds (552 NS5BIs and 696 non- NS5BIs), which are significantly more diverse in chemical structure than those used in other studies. A feature selection method was used to improve the prediction accuracy and the selection of molecular descriptors responsible for distinguishing between NS5BIs and non-NS5BIs. The prediction accuracies were 81.4%-91.7% for the NS5BIs, 78.2%-87.2% for the non-NS5BIs, and 84.1%-85.0% overall based on the three kinds of machine learning methods. SVM gave the best accuracy of 91.7% for the NS5BIs, C4.5 gave the best accuracy of 87.2% for the non-NS5BIs, and k-NN gave the best overall accuracy of 85.0% for all the compounds. This work suggests that machine learning methods can facilitate the prediction of the NS5BIs potential for unknown sets of compounds and to determine the molecular descriptors associated with NS5BIs.
关 键 词:机器学习方法 分子描述符 递归变量消除法 支持向量机 丙型肝炎病毒
分 类 号:R373[医药卫生—病原生物学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145