检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡雪梅[1,2] 李佳丽 蒋慧凤 HU Xuemei;LI Jiali;JIANG Huifeng(School of Mathematics and Statistics,Chongqing Technology and Business University,Chongqing 400067;Chongqing Key Laboratory of Social Economy and Applied Statistics,Chongqing 400067;Research Center for Economy of Upper Reaches of the Yangtze River,Chongqing 400067)
机构地区:[1]重庆工商大学数学与统计学院,重庆400067 [2]经济社会应用统计重庆市重点实验室,重庆400067 [3]长江上游经济研究中心,重庆400067
出 处:《系统科学与数学》2022年第2期417-433,共17页Journal of Systems Science and Mathematical Sciences
基 金:重庆市第五批高等学校优秀人才支持计划《基于分类方法预测股价的趋势运动》;重庆市科委基础研究与前沿探索一般项目(cstc.2018jcyjA2073);重庆市“统计学”研究生导师团队(yds183002);重庆市社会科学规划项目(2019WT59);社会经济应用统计重庆市重点实验室平台开放项目(KFJJ2018066);重庆市教委科学技术研究计划重大项目(KJZD-M202100801);重庆工商大学数理统计团队(ZDPTTD201906)资助课题。
摘 要:肝癌在所有癌症中病死率高居第二名.由于机器学习方法能改进疾病预测精度,因此文章将利用它们研究肝癌前期诊断问题,提高肝癌的预测精度.首先选取影响肝癌的10个指标作为预测变量,将579位肝癌患者分为两组:随机抽取492位患者构成训练样本,剩余87位患者构成测试样本.接着利用训练样本建立6个分类器:逻辑回归、L_(2)惩罚逻辑回归、支持向量机(Support Vector Machine,SVM)、梯度提升决策树(Gradient Boosting Decision Tree,GBDT)、人工神经网络(Artificial Neural Network,ANN)和极限梯度提升算法(eXtreme Gradient Boosting,XGBoost),其中逻辑回归和L_(2)惩罚逻辑回归采用NewtonRaphson算法得到模型参数的迭代加权最小二乘估计,计算患者肿瘤细胞为恶性和良性的概率估计,确定最佳阈值预测肿瘤性状.最后用测试样本计算混淆矩阵、灵敏度和特异度,绘制ROC曲线评价预测精度.结果表明L_(2)惩罚逻辑回归预测精度最高,SVM预测精度排第二,XGBoost预测精度排第三,逻辑回归预测精度排第四,GBDT预测精度排第五,ANN和随机森林预测精度最差.Liver cancer has the second highest fatality rate among all cancers.Machine learning methods can improve the accuracy of disease prediction.Therefore,in this paper we mainly apply machine learning methods to study the pre-diagnosis problem for liver cancer,and improve the prediction accuracy to liver cancer.Firstly,10 indicators affecting liver cancer are selected as predictors,and 579 liver cancer patients are divided into two groups:A training sample composed of 492 patients are randomly selected,and a testing sample composed of the remaining 87 patients.Then,we take advantage of the training samples to establish six classifiers:Logistic regression,L_(2) penalized logistic regression,Support Vector Machine(SVM),Gradient Boosting Decision Tree(GBDT),Artificial Neural Network(ANN) and eXtreme Gradient Boosting(XGBoost),where logistic regression and L_(2) penalized logistic regression adopt Newton-Raphson algorithm to obtain the iterative weighted least squares estimators for model parameters,calculate the probability estimate of malignant and benign tumor cells in patients,and determine the optimal threshold to predict tumor traits.Finally,the confusion matrix,sensitivity and specificity are calculated by the testing samples,and the ROC curve is drawn to evaluate the prediction accuracy.The results show that in terms of prediction accuracy,L_(2) penalized logistic regression ranks the first,SVM prediction accuracy ranks second,XGBoost prediction accuracy ranks third,logistic regression prediction accuracy ranks fourth,GBDT prediction accuracy ranks fifth,and the prediction accuracies for ANN and random forest are the worst.
关 键 词:L_(2)惩罚逻辑回归 支持向量机 梯度提升树算法 人工神经网络 极限梯度提升算法
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.120