基于Light-GBM算法对生物活性的定量预测  

Quantitative Prediction of Biological Activity of Anti-Breast Cancer Drug Candidates Based on Light-GBM Algorithm

在线阅读下载全文

作  者:毛轩晴 Xuanqing Mao(Qixin College,Zhejiang Sci-Tech University,Hangzhou Zhejiang)

机构地区:[1]浙江理工大学启新学院,浙江杭州

出  处:《运筹与模糊学》2024年第4期514-523,共10页Operations Research and Fuzziology

摘  要:雌激素受体α亚型(ERα)作为乳腺癌内分泌疗法的重要靶点,拮抗ERα活性的化合物可能是治疗乳腺癌的候选药物。本文首先对数据进预处理,包括使用肘部法则和轮廓系数确定K-means聚类K值,再进行聚类处理,并用安德鲁斯曲线可视化。采用方差过滤法和随机森林法对分子描述符进行重要性排序,并对初筛变量进行皮尔逊相关性分析,得到对生物活性影响最显著且独立性较强的20个分子描述符。接着,基于Light-GBM算法建立化合物ERα生物活性的定量预测模型,将数据集按照4:1的比例划分为训练集和测试集。测试集的MSE为0.468、RMSE为0.684、MAE为0.499、R-square为0.788。本文的模型具有较高的预测精度,能加快新药的研发速度,有助于研究乳腺癌的发生和发展机制。As an essential target for endocrine therapy of breast cancer,estrogen receptor(ERα)subtypes may be candidates for drug discovery against breast cancer if the compounds can antagonize ER activity.This study initially preprocesses the data,including determining the K value of K-means clustering using the elbow method and silhouette coefficient,conducting clustering,and visualiz-ing the results with Andrews curves.Then,variance filtering and random forest methods are used to rank the molecular descriptors in terms of importance.Pearson correlation analysis is further ap-plied to the initially screened variables,resulting in 20 molecular descriptors that have the most significant and independent impacts on biological activity.Subsequently,a quantitative prediction model for ER bioactivity of compounds is built based on the Light-GBM algorithm.The dataset is divided into a training set and a test set at a ratio of 4:1.The model performance on the test set shows an MSE of 0.468,RMSE of 0.684,MAE of 0.499,and R-square of 0.788.This model exhibits high prediction accuracy,which can accelerate the development of new drugs and contribute to the research on the occurrence and development mechanisms of breast cancer.

关 键 词:随机森林 方差过滤法 Light-GBM 距离相关系数 K-MEANS聚类 肘部法则 轮廓系数 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象