基于XGBoost特征选择的疾病诊断XLC-Stacking方法  被引量:20

XLC-Stacking Method for Disease Diagnosis Based on XGBoost Feature Selection

在线阅读下载全文

作  者:岳鹏 侯凌燕[1] 杨大利[1] 佟强 YUE Peng;HOU Lingyan;YANG Dali;TONG Qiang(Open Computer System Laboratory,Beijing Information Science and Technology University,Beijing 100101,China)

机构地区:[1]北京信息科技大学计算机开放系统实验室,北京100101

出  处:《计算机工程与应用》2020年第17期136-141,共6页Computer Engineering and Applications

基  金:国家自然科学基金(No.6177010360)。

摘  要:针对医学疾病数据中存在特征冗余的问题,以XGBoost特征选择方法度量特征重要度,删除冗余特征,选择最佳分类特征;针对识别精度不高的问题,使用Stacking方法集成XGBoost、LightGBM等多种异质分类器,并在异质分类器中引入性能更好的CatBoost分类器提升集成分类器分类精度。为了避免过拟合,选择基层分类器输出的分类概率作为高层分类器输入。实验结果表明,提出的基于XGBoost特征选择的XLC-Stacking方法相比当前主流分类算法以及单一的XGBoost算法和Stacking方法有较大提升,识别的准确率和F1-Score达到97.73%和98.21%,更加适用于疾病的诊断。Aiming at the problem of feature redundancy in medical disease data,XGBoost feature selection method is used to measure feature importance,delete redundant features,and select the best classification features.For the problem of low recognition accuracy,Stacking method is used to integrate XGBoost,LightGBM and other heterogeneous classifiers,and a better CatBoost classifier is introduced into the heterogeneous classifier to improve the classification accuracy of the integrated classifier.To avoid overfitting,the classification probability of the output of the base classifier is chosen as the high level classifier input.Experimental results show that the XLC-Stacking method based on XGBoost feature selection is greatly improved compared with the current mainstream classification algorithm and the single XGBoost algorithm and Stacking method.The accuracy of recognition and F1-Score reach 97.73%and 98.21%,which is even more suitable for the diagnosis of disease.

关 键 词:疾病诊断 特征选择 XGBoost CatBoost STACKING 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象