基于多模型融合Stacking集成学习保险欺诈预测  被引量:2

Learning Insurance Fraud Prediction Based on Multi-Model Fusion Stacking Integration

在线阅读下载全文

作  者:缪智伟 韦才敏[1] MIAO Zhiwei;WEI Caimin(Department of Mathematics,Shantou University,Shantou 515063,Guangdong,China)

机构地区:[1]汕头大学数学系,广东汕头515063

出  处:《汕头大学学报(自然科学版)》2023年第3期13-24,共12页Journal of Shantou University:Natural Science Edition

摘  要:本文结合人工智能的前沿理论研究,提出一种基于改进XGBoost与LightGBM模型融合的Stacking集成学习方式的保险欺诈行为预测模型.该模型对保险公司被保人保险欺诈行为的识别具有启示意义,有助于保险公司更好地识别被保人的骗保行为,强化自身风控体系.首先对XGBoost与LightGBM进行Stacking模型融合生成两个新特征,新生成的两个特征和原有的40个特征合并作为第二层Stacking训练模型的输入.其次,在Stacking的第二层中分别选择使用多种分类学习模型,包括Bagging、LightGBM、XGBoost以及传统机器分类模型,包括逻辑回归、高斯贝叶斯、决策树等,各模型的训练和参数均由K折交叉验证和遗传算法优化得到.算例数据来源于阿里云天池挑战赛公开的保险欺诈数据集,对构建多模型融合的Stacking模型预测性能进行了验证与测试.预测结果表明,与传统机器分类模型预测结果相比,基于XGBoost与LightGBM Stacking模型融合集成学习模型对保险欺诈行为具有较高的识别能力.最后,根据计算并可视化出最优分类模型不同特征的重要性结果,得出结论:被保人的职业、发生保险事故的城市、发生保险事故的地区、资本收益、资本亏损是识别保险欺诈行为的重要特征.The frontier theory research of artificial intelligence is combined to propose an insurance fraud behavior prediction model based on the improved stacking integration learning method of the fusion of XGBoost and LightGBM model.The model has enlightenment significance for the identification of the insured insurance fraud of the insurance companies and helps the insurance companies to better identify the insurance fraud of the insured companies and strengthen their own risk control system.First,XGBoost and LightGBM were stacking model fused to generate two new features,and the newly generated two features and the original 40 features were merged as input for the training model of layer second stacking.Secondly,in the second layer of stacking,multiple classification learning models were selected,including Bagging,LightGBM,XGboost,and traditional machine classification models,including logistic regression,Gaussian Bayes,decision tree,etc.The training and parameters of each model were obtained from K-fold cross-validation and genetic algorithm optimization.Examples verify the prediction performance of the model.The prediction results show that the integrated learning model based on XGBoost and LightGBM stacking model has a high ability to identify insurance fraud behavior compared with the prediction results of the traditional machine classification model.Finally,based on the calculation and visualization of the importance results of the different characteristics of the optimal classification model,it is concluded that the occurrence of the insured person's occupation,city,region,capital income and capital loss are the important characteristics of identifying insurance fraud.

关 键 词:保险欺诈预测 XGBoost LightGBM Stacking模型融合 特征重要性 遗传算法 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论] TP391.4[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象