基于Focal Loss修正交叉熵损失函数的信用风险评价模型及实证  被引量:31

Credit Risk Evaluation Model and Empirical Research Based on Focal Loss Modified Cross-Entropy Loss Function

在线阅读下载全文

作  者:杨莲 石宝峰[1,2] YANG Lian;SHI Bao-feng(College of Economics and Management,Northwest A&F University,Yangling 712100,China;Research Center on Credit and Big Data Analytics,Northwest A&F University,Yangling 712100,China)

机构地区:[1]西北农林科技大学经济管理学院,陕西杨凌712100 [2]西北农林科技大学信用大数据应用研究中心,陕西杨凌712100

出  处:《中国管理科学》2022年第5期65-75,共11页Chinese Journal of Management Science

基  金:国家自然科学基金资助面上项目(72173096,71873103);国家自然科学基金资助重点项目(71731003);中央农办、农业农村部乡村振兴专家咨询委员会软科学项目(202122);陕西省社会科学基金资助项目(2018D51);陕西省创新人才推进计划青年科技新星项目(2019KJXX-070);中和农信“星空计划”项目(K4030218167);西北农林科技大学仲英青年学者项目(2021-04)。

摘  要:针对信用评价违约、非违约样本比例失衡,容易出现评价模型对非违约样本识别过度,对违约样本、尤其是违约样本中困难样本识别不足的问题,将图像识别中得以广泛应用的焦点损失Focal Loss函数引入信用评价,构建Focal Loss修正交叉熵损失函数的信用风险评价模型,并用三个数据集验证了模型的有效性。创新与特色:一是在信用评价交叉熵损失函数中引入聚焦参数γ构造调节因子(1-y′)^(r),通过增大困难样本在目标损失中的权重,构建ADASYN-BPNN-FocalLoss信用风险评价模型,保证信用评价模型对不均衡数据中违约样本的识别力,弥补了现有深度学习信用评价模型无法有效识别不均衡数据中困难样本的不足。二是通过测算违约样本的K近邻非违约样本占比r_(i),求解需新合成的样本数g_(i),进而利用SMOTE算法合成新的违约样本,既保证了新生成的违约样本s;能够反映原信用评价数据的基本特征,也改变了现有违约、非违约样本不均衡致使评价模型判别能力偏低的现状。三是利用本文所建模型与ADASYN-BPNN-CrossEntropy、决策树、K最近邻、随机森林等5种模型,对中国1298个农户贷款数据和UCI公开的德国、澳大利亚信贷数据集进行分析,实证表明本文所建模型AUC、Type2-error等指标均优于现有模型。该方法可有效提升模型对困难样本的识别能力,改善违约预测性能。Credit evaluation model plays an important role in helping financial institutions to identify default risk. However, due to the imbalance of the proportion of default and non-default samples, there are the phenomena of over-recognition for non-default samples and under-recognition for default samples. Some of default samples, named hard samples, are difficult to be identified. Therefore, the key to improve the prediction performance of the model is to improve its ability to recognize the hard samples. In practice, the existing deep learning credit evaluation model, which takes the Cross Entropy as the loss function, considers that there is no difference between the contribution of the hard samples and the simple samples to the target loss. It affects the effective identification of hard samples by the model.To fill in the gap, this study advances in three aspects. First, the Focal Loss function is introduced into the field of credit scoring. The regulatory factor(1-y′);in Focal Loss Function is introduced to increase the weight of the hard samples and decrease the weight of the easy samples so that the model focuses on the hard samples in training. Second, the ADASYN model is utilized to equalize training data in order to change the current situation of the low prediction performance for the evaluation model due to sample imbalance. By calculating the proportion of non-default samples in the K-nearest neighbors of each default sample in the training set, and combining the number of default and non-default samples in the training set, the number of default samples which need to be generated is determined. Then, the SMOTE algorithm is used to generate new default samples. Third, the proposed model is applied to microfinance data of 1,298 rural households in China and four comprehensive evaluation measures, i.e. Accuracy, AUC, Type1-error and Type2-error. Compared with the ADASYN-BPNN-CrossEntropy, Decision Tree(DT), K-Nearest Neighbors(KNN), Random Forest(RF), Support Vector Machine(SVM), our proposed model is su

关 键 词:信用评价 Focal Loss BP神经网络 自适应综合过采样 

分 类 号:F830.56[经济管理—金融学] N945.16[自然科学总论—系统科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象