检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]贵州大学数学与统计学院,贵州 贵阳
出 处:《运筹与模糊学》2022年第1期68-80,共13页Operations Research and Fuzziology
摘 要:能够拮抗ERα活性的化合物可能是治疗乳腺癌的候选药物,研究这类化合物对乳腺癌的攻克具有重要意义。本文提出了对治疗乳腺癌的候选药物的实验数据进行数据预处理、特征选择、模型预测的一系列方法。目的:获得具有更好生物活性的新化合物分子。基于k-means聚类与安德鲁斯曲线的异常样本检测模型对异常样本进行剔除;对样本中729个分子描述符进行筛选,保留20个对生物活性最具有显著影响的分子描述符。使用基于三类特征筛选方法的五种方法,基于此建立了多维特征加权提取模型。构建化合物对ERα生物活性的QSAR模型。以PIC50为因变量,对筛选出的20个分子描述符作为自变量,建立了XGBoost,LightGBM机器学习模型,利用网格搜索法获取模型最优参数,保留更有效的模型预测结果。Compounds that can antagonize the activity of ERα may be candidate drugs for the treatment of breast cancer, and it is of great significance to study such compounds in the fight against breast cancer. This paper proposes a series of methods for data preprocessing, feature selection, and model prediction on experimental data of candidate drugs for the treatment of breast cancer. Objective: To obtain new compound molecules with better biological activity. The abnormal sample detection model based on k-means clustering and Andrews curve eliminated abnormal samples;729 molecular descriptors in the sample were screened, and 20 molecular descriptors with the most significant impact on biological activity were retained. Using five methods based on three types of feature screening methods, a multi-dimensional feature weighted extraction model was established based on this. Construct a quantitative prediction model of the compound’s biological activity on ERa. Using PIC50 as the dependent variable and the 20 molecular descriptors selected as independent variables, the XGBoost and LightGBM machine learning models were established, and the grid search method was used to obtain the optimal parameters of the model to retain more effective model prediction results.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200