地区异质性视角下基于XGBoost模型的中小企业贷款违约预测  

Prediction of SME Loan Default Based on XGBoost Model from Regional Heterogeneity Perspective

在线阅读下载全文

作  者:叶松 李楠 杨晓光[1,2] YE Song;LI Nan;YANG Xiaoguang(Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China;School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100049,China;School of Business,Shandong Normal University,Jinan 250014,China)

机构地区:[1]中国科学院数学与系统科学研究院,北京100190 [2]中国科学院大学经济与管理学院,北京100190 [3]山东师范大学商学院,山东济南250014

出  处:《运筹与管理》2024年第12期129-136,共8页Operations Research and Management Science

基  金:国家自然科学基金资助项目(72003110,72192800,T2293771)。

摘  要:本文提出一种基于地区异质性的XGB_F贷款违约预测模型并使用国内某银行中小企业贷款数据进行实证。中小企业贷款违约数据往往存在违约贷款远少于正常贷款的样本比例不平衡问题,现有违约预测模型一般通过采样技术平衡样本,不仅增加了数据处理的负担,也容易忽略样本在地区层面的异质性。本文根据地区对样本分组后,在XGBoost模型中引入Focal Loss函数建立XGB_F模型改善样本比例不平衡。在国内某银行中小企业贷款数据上的预测结果表明,XGB_F模型对不同地区样本的违约预测性能指标Recall和G-mean都显著优于使用采样技术的逻辑回归、随机森林等统计模型和机器学习模型。最后,本文基于XGB_F模型特征重要性筛选出识别各地区违约贷款的重点指标,为银行在不同地区的贷前决策和贷后管理提供依据。With the process of China marketization,small and medium enterprises(SMEs)hold an increasingly important status in national economy.Bank loan is the main financing channel for SMEs,but SMEs have long faced prominent financing constraints in this channel,limiting their development.Therefore,having an accurate credit risk identification and prevention is the key to solving the credit financing problems of SMEs.SMEs’loan data often suffer from imbalanced ratio of default samples,and existing default prediction models generally balance data through sampling techniques,ignoring the heterogeneity of loan customers at the regional level.Thus,we propose an XGB_F loan default prediction model based on regional heterogeneity.In the proposed XGB_F model,we firstly group loan data by province,and then model the data of each province based on the XGBoost algorithm applying Focal Loss function to improve the imbalanced ratio of default samples and predict loan default.XGBoost has the advantages of high accuracy,strong flexibility and insensitivity to data imbalance problems,ensuring its outstanding prediction performance.Focal Loss function is an improved loss function for imbalanced data,enhancing a classification performance by focusing on training the model’s identification ability for minority class and hard-to-classify samples.As a comparison,most commonly used machine learning models in existing literature are chosen,and the process consists of sampling imbalanced data and a classifier to predict loan default.The SMOTE-ENN algorithm is selected for sampling and four classifiers,namely,logistic regression,SVM,random forest and GBDT,are selected for classification.To verify the effectiveness of our proposed XGB_F model based on regional heterogeneity,a comparative analysis of evaluation metrics is conducted on the SME loan data from a bank in China.The data cover 22 provinces and municipalities directly under the administration of China’s central government,with significant differences shown in the distribution of lo

关 键 词:中小企业 信贷违约 违约预测 

分 类 号:F832.4[经济管理—金融学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象