机构地区:[1]厦门理工学院经济与管理学院,福建厦门361005
出 处:《厦门大学学报(自然科学版)》2024年第2期232-240,共9页Journal of Xiamen University:Natural Science
基 金:国家自然科学基金(7180040248);福建省自然科学基金(2022J011261)。
摘 要:[目的]由于购买商品的消费者数量远小于未购买商品的消费者数量,网购意愿预测研究是典型的不平衡数据分类问题.研究不平衡数据的分类问题以提升网购意愿预测的分类准确率,该问题主要存在少数类样本识别准确率远小于多数类样本的问题.[方法]提出一种基于贝叶斯优化的代价敏感轻量梯度提升机(Light Gradient Boosting Machine, LightGBM)模型.首先引入误分类代价作为惩罚因子修正LightGBM的损失函数,其次通过阈值移动降低模型的分类阈值以提高针对少数类样本的预测准确率,最后利用贝叶斯优化算法优化误分类代价参数、分类阈值及其他参数.[结果]从KEEL数据库中选取5个典型的不平衡数据集进行对比实验,相较于标准LightGBM模型,改进LightGBM模型的AUC值和G-mean值均提升了10%左右;相较于遗传算法优化代价敏感LightGBM模型和粒子群优化代价敏感LightGBM模型,改进LightGBM模型的AUC值和G-mean值普遍提升了4%左右;相较于ADASYN-LightGBM模型和BorderlineSMOTE-LightGBM模型,改进LightGBM模型的AUC值和G-mean值普遍提升了3%左右.[结论]基于代价敏感学习在LightGBM损失函数中添加误分类代价作为惩罚因子,并通过阈值移动降低模型的分类阈值,同时利用贝叶斯优化算法优化代价敏感LightGBM模型中的误分类代价参数、分类阈值及其他参数,实现更高的少数类样本预测准确率,提升了网购意愿预测的分类准确率.[Objective]The research of online shopping intention prediction is a typical unbalanced data classification problem.The number of consumers buying goods is much smaller than the number of consumers not buying goods.The purpose of this paper is to solve the problem that the recognition accuracy of minority samples is much lower than that of majority samples.[Methods]This paper proposes a cost-sensitive LightGBM(light gradient boosting machine)model based on Bayes optimization.Firstly,the misclassification cost is introduced as a penalty factor to modify the loss function of LightGBM.Secondly,the classification threshold of the model is reduced by threshold shifting to improve the prediction accuracy of minority samples.Finally,the parameters of misclassification cost,classification threshold and other parameters are optimized by Bayes optimization algorithm.[Results]Five typical unbalanced datasets are selected from the KEEL database.To verify the effectiveness of the improved LightGBM algorithm proposed in this paper,the improved LightGBM algorithm is compared with standard LightGBM algorithm,genetic algorithm optimization cost-sensitive LightGBM algorithm,particle swarm optimization cost-sensitive LightGBM algorithm,ADASYN-LightGBM(adaptive synthetic sampling approach)algorithm,BorderlineSMOTE-LightGBM(borderline synthetic minority oversampling technique)algorithm,respectively.The AUC(area under curve)and G-mean(geometric mean)are used as evaluation indexes to evaluate the performance of the model,and the final experimental results are obtained after 100 iterations and cross-validation with ten folds.Compared with the standard LightGBM model,the AUC value and G-mean value of the cost-sensitive LightGBM model have both increased by about 10%,indicating that the introduction of cost-sensitive learning has significantly improved the classification performance of LightGBM model,and can better deal with unbalanced data classification problems.Compared with genetic algorithm optimization cost-sensitive LightGBM model
关 键 词:不平衡数据 贝叶斯优化 代价敏感 LightGBM 网购意愿预测
分 类 号:F724.6[经济管理—产业经济] F713.55[自动化与计算机技术—控制理论与控制工程] TP181[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...