Comparison of three data mining methods in predicting 5-year survival of colorectal cancer patients  

Comparison of three data mining methods in predicting 5-year survival of colorectal cancer patients

在线阅读下载全文

作  者:Luo Yan Sun Yawei Fu Qunchao Xue Tengfei Zhou Ping 

机构地区:[1]School of Software Engineering,Beijing University of Posts and Telecommunications [2]Key Laboratory of Trustworthy Distributed Computing and Service(BUPT),Ministry of Education

出  处:《The Journal of China Universities of Posts and Telecommunications》2018年第6期65-73,共9页中国邮电高校学报(英文版)

基  金:supported by the National Key Research and Development Program of China(2017YFC1307705)

摘  要:The prediction of colorectal cancer(CRC) survivability has always been a challenging research issue. Considering the importance of predicting CRC patients’ survival rates, we compared the performance of three data mining methods: decision trees(DTs), artificial neural networks(ANNs) and support vector machines(SVMs), for predicting 5-year survival of CRC patients to assist clinicians in making treatment decisions. The CRC dataset used to build the prediction model comes from the surveillance, epidemiology, and end results(SEER) program. The 5-fold cross-validation and random forest algorithm were respectively utilized for measuring the model predictive accuracy and the importance of features. Experimental results show that the predictive accuracy of ANNs(0.73) and SVMs(0.75) were higher than that of DTs, and they also have the best result in the area under the receiver operating characteristic(ROC) curve(area under curve(AUC)=0.82). This result may indicate high predictive power of ANNs and SVMs for predicting 5-year survival of CRC patients.The prediction of colorectal cancer(CRC) survivability has always been a challenging research issue. Considering the importance of predicting CRC patients' survival rates, we compared the performance of three data mining methods: decision trees(DTs), artificial neural networks(ANNs) and support vector machines(SVMs), for predicting 5-year survival of CRC patients to assist clinicians in making treatment decisions. The CRC dataset used to build the prediction model comes from the surveillance, epidemiology, and end results(SEER) program. The 5-fold cross-validation and random forest algorithm were respectively utilized for measuring the model predictive accuracy and the importance of features. Experimental results show that the predictive accuracy of ANNs(0.73) and SVMs(0.75) were higher than that of DTs, and they also have the best result in the area under the receiver operating characteristic(ROC) curve(area under curve(AUC)=0.82). This result may indicate high predictive power of ANNs and SVMs for predicting 5-year survival of CRC patients.

关 键 词:data mining 5-year survival CRC SEER 

分 类 号:TN[电子电信]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象