检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:钟彩 杨亚鑫 王璟德[1] 孙巍[1] ZHONG Cai;YANG Ya-xin;WANG Jing-de;SUN Wei(College of Chemical Engineering,Beijing University of Chemical Technology,Beijing 100029,China)
出 处:《化学研究与应用》2022年第10期2350-2356,共7页Chemical Research and Application
基 金:国家自然科学基金项目(21878012)资助。
摘 要:基于不同的机器学习方法探索识别抗肿瘤药的合适的特征筛选方法。收集了200个抗肿瘤药和600个非抗肿瘤药,形成三组不同的平衡数据集。采用斯皮尔曼系数与谷本系数计算的相关性矩阵与6个特征重要性指标结合进行相关性特征筛选。筛选后的数据集使用自适应提升树、随机森林、支持向量机进行分类。基于三个平衡数据集,采用的特征筛选方法对不同的机器学习方法获得的评价指标均有不同程度的提升。特别是自适应提升树,与其他筛选操作相比,至少有一种改进的特征筛选方法提升了六个评价指标的所得值。根据三个数据集的结果分析,重要性指标中整体方差和信息熵表现得更好,为将来的抗肿瘤药物识别提供一定的参考。Suitable feature screening method is explored for identifying anti-tumor drugs on basis of different machine learning methods.200 antitumor drugs and 600 non-antitumor drugs are collected and composited respectively three balanced datasets.Based on them,correlation feature screening is carried out by combining correlation matrix computed by Spearman and Tanimoto coefficient and 6 indicators that measure importance of features.After feature selection,AdaBoost Tree(ABT)and Random Forest(RF),Support Vector Machine(SVM)are used to classify data sets.Based on three balanced data sets,the applied feature selection improves evaluation indicators to varying degrees for different machine learning methods.Especially for ABT,compared with other feature selection methods,at least an improved feature selection method improves six evaluation indicators.According to the analysis of the results of the three data sets,overall variance and information entropy perform better in indicators of ranking features,which provides certain reference for the future identification of antitumor drug.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.135.194.164