基于Class Balanced Loss修正交叉熵的非均衡样本信用风险评价模型  被引量:10

A Credit Risk Evaluation Model for Imbalanced Data Classification Based on Class Balanced Loss Modified Cross Entropy Function

在线阅读下载全文

作  者:杨莲 石宝峰[1,2] 董轶哲 YANG Lian;SHI Baofeng;DONG Yizhe(College of Economics and Management,Northwest A&F University,Yangling 712100,Shaanxi,China;Research Center on Credit and Big Data Analytics,Northwest A&F University,Yangling 712100,Shaanxi,China;University of Edinburgh Business School,Edinburgh,EH89JS,UK)

机构地区:[1]西北农林科技大学经济管理学院,陕西杨凌712100 [2]西北农林科技大学信用大数据应用研究中心,陕西杨凌712100 [3]爱丁堡大学商学院,英国爱丁堡EH89JS

出  处:《系统管理学报》2022年第2期255-269,289,共16页Journal of Systems & Management

基  金:国家自然科学基金面上项目(71873103,72173096);国家自然科学基金重点项目(71731003);中央农办、农业农村部乡村振兴专家咨询委员会软科学研究项目(2021-22);中和农信“星空计划”项目(K4030218167);西北农林科技大学仲英青年学者项目。

摘  要:针对传统信用风险预测模型存在对非违约样本识别过度、对违约样本识别不足的问题,将平衡损失Class Balanced Loss函数引入信用风险评价,构建Class Balanced Loss修正交叉熵的非均衡样本信用风险评价模型。利用所建模型与交叉熵神经网络、支持向量机、决策树、随机森林和K最近邻5种分类模型进行对比,验证BPNN-CBCE对中国某金融机构1 534笔农户贷款数据信用风险预测的有效性;在此基础上,利用UCI公开的德国信贷数据验证BPNN-CBCE模型的稳健性。研究表明:对于农户数据,BPNN-CBCE模型在AUC、违约召回率Default recall方面普遍优于BPNN-CE、SVM、DT、RF和KNN模型,其中,BPNN-CBCE的Default recall相比5种对比模型提升了41.3个百分点,AUC相比5种对比模型提升了15.6个百分点;对于德国数据集,BPNN-CBCE评级模型在AUC、违约召回率Default recall方面也均优于5种对比模型。因此,BPNN-CBCE信用评价模型对农户不均衡信贷数据中的违约样本具有较好的识别能力,可有效降低金融机构客户误判带来的损失。创新与特色:①利用Class Balanced Loss中的平衡因子ω,增大违约样本在目标损失中的权重、降低非违约样本在目标损失中的权重,客观调节正负样本损失在目标损失中权重,弥补交叉熵函数无法调节两类样本损失权重的缺陷,克服由样本不均衡带来的评价模型对非违约样本识别过度、对违约样本识别不足。②通过考虑数据重叠,利用随机覆盖方法,分别对贷款数据中违约、非违约样本进行不放回采样,以对全样本空间X_(违约)、X_(非违约)进行不重叠覆盖,计算两类贷款客户的有效样本数量。既反映由于真实数据之间的内在相似性,随着样本数量的增加,新添加样本很可能是现有样本近似重复的客观事实,也保证基于有效样本对两类样本损失进行重新加权的客观性。将图像识别领域中的Class Balanced Loss函数引�To address the problem that imbalanced credit scoring data sets lead to over-recognition for non-default samples and under-recognition for default samples, this paper creates a novel credit risk evaluation model by introducing the class balanced loss function. It compares the BPNN-CBCE(back propagation neural network-class balanced cross entropy) with the BPNN-CE(back propagation neural network-cross entropy), the SVM(support vector machines), the DT(decision tree), the RF(random forest), and the KNN(K-nearest neighbor) to verify the effectiveness of the BPNN-CBCE model in predicting the credit risk of 1 534 farmers’ loan data of a financial institution in China. In addition, it tests the robustness of the BPNN-CBCE model by using the German credit data published by UCI(University of California). The results show that for farmers’ loan data, the default recall of the BPNN-CBCE is 41.3% higher than those of other models, and the AUC(area under curve) of the BPNN-CBCE is 15.6% higher than those of other models. For German credit data, the BPNN-CBCE model is also better than the BPNN-CE, the SVM, the DT, the RF and the KNN models in AUC and default recall. Therefore, the BPNN-CBCE credit risk evaluation model has a good ability to identify the default samples in the imbalanced credit data of farmers, and can reduce the losses caused by misjudgment of default customers by financial institutions. This paper is contributive because the balance factor ω in class balanced loss is used to adjust the weight of non-default and default samples loss in target loss, which compensates for the defect that the cross-entropy loss function cannot adjust the weight, and overcomes the excessive recognition of non-default samples and the insufficient recognition of default samples caused by the sample imbalance. In addition, the random covering method is used to sample non-default or default samples without putting them back until the whole sample space X_(non-default)or X_(default)is fully covered, and the number of effective sa

关 键 词:信用评价 Class Balanced Loss BP神经网络 交叉熵 小额信贷 

分 类 号:F830.56[经济管理—金融学] N945.16[自然科学总论—系统科学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象