特征筛选对抗肿瘤药物识别的影响研究  被引量:1

The study on impact of feature selection for identifying antitumor drug

在线阅读下载全文

作  者:钟彩 杨亚鑫 王璟德[1] 孙巍[1] ZHONG Cai;YANG Ya-xin;WANG Jing-de;SUN Wei(College of Chemical Engineering,Beijing University of Chemical Technology,Beijing 100029,China)

机构地区:[1]北京化工大学化学工程学院,北京100029

出  处:《化学研究与应用》2022年第10期2350-2356,共7页Chemical Research and Application

基  金:国家自然科学基金项目(21878012)资助。

摘  要:基于不同的机器学习方法探索识别抗肿瘤药的合适的特征筛选方法。收集了200个抗肿瘤药和600个非抗肿瘤药,形成三组不同的平衡数据集。采用斯皮尔曼系数与谷本系数计算的相关性矩阵与6个特征重要性指标结合进行相关性特征筛选。筛选后的数据集使用自适应提升树、随机森林、支持向量机进行分类。基于三个平衡数据集,采用的特征筛选方法对不同的机器学习方法获得的评价指标均有不同程度的提升。特别是自适应提升树,与其他筛选操作相比,至少有一种改进的特征筛选方法提升了六个评价指标的所得值。根据三个数据集的结果分析,重要性指标中整体方差和信息熵表现得更好,为将来的抗肿瘤药物识别提供一定的参考。Suitable feature screening method is explored for identifying anti-tumor drugs on basis of different machine learning methods.200 antitumor drugs and 600 non-antitumor drugs are collected and composited respectively three balanced datasets.Based on them,correlation feature screening is carried out by combining correlation matrix computed by Spearman and Tanimoto coefficient and 6 indicators that measure importance of features.After feature selection,AdaBoost Tree(ABT)and Random Forest(RF),Support Vector Machine(SVM)are used to classify data sets.Based on three balanced data sets,the applied feature selection improves evaluation indicators to varying degrees for different machine learning methods.Especially for ABT,compared with other feature selection methods,at least an improved feature selection method improves six evaluation indicators.According to the analysis of the results of the three data sets,overall variance and information entropy perform better in indicators of ranking features,which provides certain reference for the future identification of antitumor drug.

关 键 词:特征筛选 相关矩阵 重要性指标 分子指纹 机器学习 

分 类 号:R95[医药卫生—药学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象