检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:缪智伟 韦才敏[1] MIAO Zhiwei;WEI Caimin(Department of Mathematics,Shantou University,Shantou 515063,Guangdong,China)
出 处:《汕头大学学报(自然科学版)》2023年第3期13-24,共12页Journal of Shantou University:Natural Science Edition
摘 要:本文结合人工智能的前沿理论研究,提出一种基于改进XGBoost与LightGBM模型融合的Stacking集成学习方式的保险欺诈行为预测模型.该模型对保险公司被保人保险欺诈行为的识别具有启示意义,有助于保险公司更好地识别被保人的骗保行为,强化自身风控体系.首先对XGBoost与LightGBM进行Stacking模型融合生成两个新特征,新生成的两个特征和原有的40个特征合并作为第二层Stacking训练模型的输入.其次,在Stacking的第二层中分别选择使用多种分类学习模型,包括Bagging、LightGBM、XGBoost以及传统机器分类模型,包括逻辑回归、高斯贝叶斯、决策树等,各模型的训练和参数均由K折交叉验证和遗传算法优化得到.算例数据来源于阿里云天池挑战赛公开的保险欺诈数据集,对构建多模型融合的Stacking模型预测性能进行了验证与测试.预测结果表明,与传统机器分类模型预测结果相比,基于XGBoost与LightGBM Stacking模型融合集成学习模型对保险欺诈行为具有较高的识别能力.最后,根据计算并可视化出最优分类模型不同特征的重要性结果,得出结论:被保人的职业、发生保险事故的城市、发生保险事故的地区、资本收益、资本亏损是识别保险欺诈行为的重要特征.The frontier theory research of artificial intelligence is combined to propose an insurance fraud behavior prediction model based on the improved stacking integration learning method of the fusion of XGBoost and LightGBM model.The model has enlightenment significance for the identification of the insured insurance fraud of the insurance companies and helps the insurance companies to better identify the insurance fraud of the insured companies and strengthen their own risk control system.First,XGBoost and LightGBM were stacking model fused to generate two new features,and the newly generated two features and the original 40 features were merged as input for the training model of layer second stacking.Secondly,in the second layer of stacking,multiple classification learning models were selected,including Bagging,LightGBM,XGboost,and traditional machine classification models,including logistic regression,Gaussian Bayes,decision tree,etc.The training and parameters of each model were obtained from K-fold cross-validation and genetic algorithm optimization.Examples verify the prediction performance of the model.The prediction results show that the integrated learning model based on XGBoost and LightGBM stacking model has a high ability to identify insurance fraud behavior compared with the prediction results of the traditional machine classification model.Finally,based on the calculation and visualization of the importance results of the different characteristics of the optimal classification model,it is concluded that the occurrence of the insured person's occupation,city,region,capital income and capital loss are the important characteristics of identifying insurance fraud.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30