Chi-Square and PCA Based Feature Selection for Diabetes Detection with Ensemble Classifier  被引量:1

在线阅读下载全文

作  者:Vaibhav Rupapara Furqan Rustam Abid Ishaq Ernesto Lee Imran Ashraf 

机构地区:[1]School of Computing and Information Sciences,Florida International University,USA [2]Department of Computer Science,Khwaja Fareed University of Engineering and Information Technology,Rahim Yar Khan,64200,Pakistan [3]Department of Computer Science,Broward College,Broward County,Florida,USA [4]Department of Information and Communication Engineering,Yeungnam University,Gyeongsan-si,38541,Korea

出  处:《Intelligent Automation & Soft Computing》2023年第5期1931-1949,共19页智能自动化与软计算(英文)

基  金:supported by the Florida Center for Advanced Analytics and Data Science funded by Ernesto.Net(under the Algorithms for Good Grant).

摘  要:Diabetes mellitus is a metabolic disease that is ranked among the top 10 causes of death by the world health organization.During the last few years,an alarming increase is observed worldwide with a 70%rise in the disease since 2000 and an 80%rise in male deaths.If untreated,it results in complications of many vital organs of the human body which may lead to fatality.Early detection of diabetes is a task of significant importance to start timely treatment.This study introduces a methodology for the classification of diabetic and normal people using an ensemble machine learning model and feature fusion of Chi-square and principal component analysis.An ensemble model,logistic tree classifier(LTC),is proposed which incorporates logistic regression and extra tree classifier through a soft voting mechanism.Experiments are also performed using several well-known machine learning algorithms to analyze their performance including logistic regression,extra tree classifier,AdaBoost,Gaussian naive Bayes,decision tree,random forest,and k nearest neighbor.In addition,several experiments are carried out using principal component analysis(PCA)and Chi-square(Chi-2)fea-tures to analyze the influence of feature selection on the performance of machine learning classifiers.Results indicate that Chi-2 features show high performance than both PCA features and original features.However,the highest accuracy is obtained when the proposed ensemble model LTC is used with the proposed fea-ture fusion framework-work which achieves a 0.85 accuracy score which is the highest of the available approaches for diabetes prediction.In addition,the statis-tical T-test proves the statistical significance of the proposed approach over other approaches.

关 键 词:Diabetes mellitus prediction feature fusion ensemble classifier principal component analysis CHI-SQUARE 

分 类 号:R587.1[医药卫生—内分泌]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象