检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Fahim Nasir Abdulghani Ali Ahmed Mehmet Sabir Kiraz Iryna Yevseyeva Mubarak Saif
机构地区:[1]School of Computer Science and Informatics,De Montfort University,Leicester,LE19BH,UK [2]Faculty of Computer Science and Information Technology,Universiti Tun Hussein Onn Malaysia,Johor,86400,Malaysia
出 处:《Computers, Materials & Continua》2024年第10期1703-1728,共26页计算机、材料和连续体(英文)
基 金:support from the Cyber Technology Institute(CTI)at the School of Computer Science and Informatics,De Montfort University,United Kingdom,along with financial assistance from Universiti Tun Hussein Onn Malaysia and the UTHM Publisher’s office through publication fund E15216.
摘 要:Integrating machine learning and data mining is crucial for processing big data and extracting valuable insights to enhance decision-making.However,imbalanced target variables within big data present technical challenges that hinder the performance of supervised learning classifiers on key evaluation metrics,limiting their overall effectiveness.This study presents a comprehensive review of both common and recently developed Supervised Learning Classifiers(SLCs)and evaluates their performance in data-driven decision-making.The evaluation uses various metrics,with a particular focus on the Harmonic Mean Score(F-1 score)on an imbalanced real-world bank target marketing dataset.The findings indicate that grid-search random forest and random-search random forest excel in Precision and area under the curve,while Extreme Gradient Boosting(XGBoost)outperforms other traditional classifiers in terms of F-1 score.Employing oversampling methods to address the imbalanced data shows significant performance improvement in XGBoost,delivering superior results across all metrics,particularly when using the SMOTE variant known as the BorderlineSMOTE2 technique.The study concludes several key factors for effectively addressing the challenges of supervised learning with imbalanced datasets.These factors include the importance of selecting appropriate datasets for training and testing,choosing the right classifiers,employing effective techniques for processing and handling imbalanced datasets,and identifying suitable metrics for performance evaluation.Additionally,factors also entail the utilisation of effective exploratory data analysis in conjunction with visualisation techniques to yield insights conducive to data-driven decision-making.
关 键 词:Big data machine learning data mining data visualization label encoding imbalanced dataset sampling techniques
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.73