基于SMOTE算法和机器学习模型建立原发性肝癌术后的预后预测模型  

Construction of postoperative prognostic model for primary liver cancer based on SMOTE and machine learning

在线阅读下载全文

作  者:潘比 余靖华 黄译贤 伍亚舟[1] 李芳[1] PAN Bi;YU Jinghua;HUANG Yixian;WU Yazhou;LI Fang(Department of Health Statistics,Faculty of Military Preventive Medicine,Army Medical University(Third Military Medical University),Chongqing,400038,China)

机构地区:[1]陆军军医大学(第三军医大学)军事预防医学系军队卫生统计学教研室,重庆400038

出  处:《陆军军医大学学报》2024年第19期2236-2240,共5页Journal of Army Medical University

基  金:国家自然科学基金面上项目(82173621,81872716)。

摘  要:目的基于合成少数类过采样技术(synthetic minority over-sampling technique,SMOTE)算法和机器学习模型构建原发性肝癌术后的预后预测模型。方法选取美国国立癌症研究所的监测、流行病学及最终结果(Surveillance,Epidemiology,and End Results,SEER)数据库中4297例患者进行回顾性队列研究,通过独热编码和平均值插补法进行数据预处理,利用SMOTE算法解决数据类别不平衡问题,将临床变量纳入机器学习模型,基于决策树(decision tree,DT)、随机森林(random forest,RF)、梯度提升决策树(gradient boosting decision tree,GBDT)、极限梯度提升算法(eXtreme Gradient Boosting,XGBoost)方法构建预后预测模型(SMOTE+DT/RF/GBDT/XGBoost),通过比较多种模型的性能,筛选出最佳的预测模型。结果组合模型SMOTE+RF展示出最优的预测性能,受试者工作特征曲线(receiver operating characteristic curve,ROC)下的面积(area under the curve,AUC)、准确率和精确率均高于其他模型,分别为0.895、0.811、0.806。结论基于SMOTE+RF算法的原发性肝癌的预后预测模型可有效预测原发性肝癌患者的生存结局。Objective To construct a prognosis prediction model of primary liver cancer after surgical treatment based on synthetic minority over-sampling technique(SMOTE)algorithm and machine learning model.Methods A retrospective cohort study was conducted on 4297 patients with primary liver cancer from the surveillance,epidemiology,and end results(SEER)database.One-Hot Encoding and Multiple Imputation were used to preprocess the collect data,and SMOTE algorithm was employed to solve the imbalance of data categories.The obtained clinical variables were included in the machine learning model.Based on decision tree(DT),random forest(RF),gradient boosting decision tree(GBDT)and eXtreme Gradient Boosting(XGBoost),a prognostic prediction model(SMOTE+DT/RF/GBDT/XGBoost)was build,and then the best prediction model was determined by comparing the performance of various models.Finally,a prognostic analysis system for primary liver cancer was developed based on the optimal model,which was then visualized.Results The combination model SMOTE+RF showed the best predictive performance,with higher area under the curve(0.895),accuracy(0.811)and precision(0.806)than those of other models in receiver operating characteristic curve(ROC)analysis.Conclusion The SMOTE+RF prognostic prediction model can effectively predict the survival outcome of patients with primary liver cancer.

关 键 词:原发性肝癌 少数类过采样技术算法 机器学习 预测模型 

分 类 号:R319[医药卫生—基础医学] R730.7R735.7

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象