检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]上海理工大学管理学院,上海
出 处:《建模与仿真》2023年第4期3747-3755,共9页Modeling and Simulation
摘 要:本文运用CCF竞赛提供的中原银行个人信用贷款违约数据,进行了数据清洗和特征工程的工作,从初始的38个特征缩减到18个特征,结合5C理论和预期收入理论探究了影响银行个人信用风险的重要因素,经过特征重要性排序排名前五的因素是:信贷周转余额合计、贷款发放日期据初始日期天数、借款人贷款评分平均分、当前贷款利率和匿名变量f0。为提升银行对个人信用风险评估的准确率,本文基于随机森林模型比较了SMOTE、随机欠采样和SMOTEENN三种非平衡数据的处理方法进行实验,SMOTEENN组合采样的效果最好;然后建立了决策树、随机森林、AdaBoost和LightGBM共4个机器学习模型,结果表明平衡后LightGBM的准确率最高,达到了96.1%。In this paper, using the personal credit loan default data of Zhongyuan Bank provided by the CCF competition, the data cleaning and feature engineering was carried out and the initial 38 features were reduced to 18 features. Then the important factors affecting the bank personal credit risk were explored by combining the 5C theory and expected income theory, and the top five factors ranked by feature importance were: total credit working balance, loan disbursement date accord-ing to the initial date days, borrower’s average loan score, current loan interest rate and anonymous variable f0. In order to improve the accuracy of bank personal credit risk assessment, this paper compared three methods of processing unbalanced data, SMOTE, random under sampling and SMOTEENN, based on the random forest model, and SMOTEENN combined sampling had the best effect;then a total of four machine learning models, decision tree, random forest, AdaBoost and LightGBM, were established and it’s showed that LightGBM had the highest accuracy rate after bal-ancing, reaching 96.1%.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49