检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:章鸣嬛[1] 张璇 郭欣[1] 陈瑛 ZHANG Minghuan;ZHANG Xuan;GUO Xin;CHEN Ying(Research Center of Big Data Analyses and Process,Shanghai Sanda University,Shanghai,201209)
出 处:《北京生物医学工程》2019年第5期486-491,497,共7页Beijing Biomedical Engineering
基 金:2016年上海市民办高校重点科研项目(2016-SHNGE-01ZD);2015年IBM大学合作部联合研究项目(D-2111-15-001)资助
摘 要:目的以SEER数据库中1990—2014年间的乳腺癌数据为研究对象,利用机器学习方法,分析乳腺癌的预后因素,辅助医师对患者的预后进行有效评判。方法根据临床医师的建议,筛选了12个字段作为模型输入字段,以术后5年生存状况作为模型输出字段。首先利用单因素统计分析方法初步筛选预后因素,再分别利用logistic回归和决策树两种机器学习分类算法进行建模分析,藉此寻找影响乳腺癌5年预后的因素。采用十折交叉法组织样本数据,并利用过抽样和欠抽样技术进行样本的平衡处理;以灵敏度、特异度及ROC下的AUC等参数作为模型的评价指标。结果在12个模型输入字段中,肿瘤分期、肿瘤分级、肿瘤尺寸、雌激素水平、年龄分组、孕激素水平等因素对于乳腺肿瘤预后具有较大影响;在此两种模型下,模型测试集上的灵敏度和特异度均介于74.2%~78.2%之间,AUC均处于0.838~0.850之间。结论利用Logistic回归和决策树算法构建乳腺癌患者的优化预后模型,可辅助医师判断患者预后情况及治疗效果。Objective On the basis of the breast cancer data from 1990 to 2014 in the SEER database,this paper is to study prognostic factors of breast cancer with machine learning method with a view to assisting doctors in evaluating the prognosis. Methods With the advice of clinicians,twelve fields are selected as model inputs;the 5-year survival status after surgery as model outputs. After developed,the prognostic factors are firstly primarily screened with the single factor statistical analysis method;the factors affecting the 5-year prognosis of breast cancer are explored by modeling and analyzing via the logistic regression and the decision tree,two kinds of machine learning classification algorithms. The sample data are processed with the ten-fold crossover method,and then are subject to equalization treatment by oversampling and under-sampling techniques;the evaluation criteria of the models developed include sensitivity,specificity,and the ROC curve areas (AUC). Results The twelve fields,such factors as tumor stage,tumor grade,tumor size,estrogen level ,age grouping,and progesterone level have a great impact on the prognosis of breast tumors.The results from two models,both the sensitivity and specificity of the model test set are between 74.2% and 78.2%,and the AUC of the two models are between 0.838 and 0.850. Conclusions Optimal prognosis models developed with logistic regression and decision tree algorithms can assist doctors in assessing the prognosis and the treatment effect.
关 键 词:SEER数据库 乳腺癌 LOGISTIC回归 决策树 预后因素
分 类 号:R318.04[医药卫生—生物医学工程] Q334[医药卫生—基础医学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.43