检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张继婕 覃庆洪[2] 刘雪萍[3] 王康权 魏薇 ZHANG Jijie;QIN Qinghong;LIU Xueping;WANG Kangquan;WEI Wei(College of Science,Guangxi University of Science and Technology,Liuzhou 545006,China;Affiliated Cancer Hospital,Guangxi Medical University,Nanning 530021,China;Medical School,Guangxi University of Science and Technology,Liuzhou 545005,China)
机构地区:[1]广西科技大学理学院,广西柳州545006 [2]广西医科大学附属肿瘤医院,广西南宁530021 [3]广西科技大学医学部,广西柳州545005
出 处:《广西科技大学学报》2022年第1期101-109,共9页Journal of Guangxi University of Science and Technology
基 金:广西自然科学基金项目(2019GXNSFAA245067)资助。
摘 要:为对乳腺癌5年生存状态进行预测并分析其影响因素,首先,选取SEER数据库中2004—2010年乳腺癌相关数据,对选取的特征进行数据预处理;其次,在数据层面上,对数据进行SMOTE上采样以解决数据类别不平衡问题;在算法层面上,比较LightGBM、CatBoost和GBDT这3个模型在预测乳腺癌5年生存状态上的优劣;最后,根据重要性对乳腺癌5年生存状态的影响因素进行排序,并通过SHAP值对影响因素进行解释分析。本文构建的乳腺癌5年生存状态预测模型比单一模型具有更好的性能,其准确率、AUC、召回率、精确度和F_(1)值分别为0.9060、0.8443、0.9837、0.9160和0.9487;发现乳腺癌5年生存状态与肿瘤大小、检出的淋巴结总数、淋巴结转移数、雌激素受体、孕激素受体、年龄等因素有较大关系。本预测模型选择出的重要性特征与目前的临床结果保持一致,能为临床预后预测提供一定的技术支持。The research is conducted to predict the 5-year survival status of breast cancer and analyze the influence factors.Firstly,the breast cancer related data from 2004—2010 were selected from the SEER database,and the selected featured data were preprocessed.Secondly,in terms of data,SMOTE algorithm was used to oversample the data to solve the imbalance of data categories;in terms of algorithm,the advantagess and disadvantages of lightgbm,catboost and gbc in predicting the 5-year survival status of breast cancer were compared.Finally,the influencing factors of breast cancer 5-year survival status were analyzed by SHAP value after ranking.The 5-year survival prediction model of breast cancer constructed in this paper has better performance than a single model.The accuracy rate,AUC,recall rate,precision rate and F_(1)-score are 0.9060,0.8443,0.9837,0.9160 and 0.9487 respectively;and it shows that the 5-year survival status of breast cancer is closely related to tumor size,examined lymph nodes,positive lymph nodes,ER status,PR status,and age.The model can provide prognosis prediction for the clinic with its excellent performance and the selected important features consistent with the current clinical results.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.221.185.110