融合非结构化文本信息的中小企业信用风险预测研究

Research on Credit Risk Prediction for Small and Medium-Sized Enterprises by Integrating Unstructured Textual Information

作　　者：孟祥俊陈进东张健[1,2] MENG Xiangjun;CHEN Jindong;ZHANG Jian(School of Economics&Management,Beijing Information Science and Technology University Beijing 100192;Beijing International Science and Technology Cooperation Base for Intelligent Decision and Big Data Application,Beijing 100192)

机构地区：[1]北京信息科技大学经济管理学院,北京100192 [2]智能决策与大数据应用北京市国际科技合作基地,北京100192

出　　处：《系统科学与数学》2024年第6期1608-1629,共22页Journal of Systems Science and Mathematical Sciences

基　　金：国家重点研发计划课题(2019YFB1405303);北京市属高等学校优秀青年人才培育计划项目(BPHR202203233);国家自然科学基金面上项目(72174018)资助课题

摘　　要：基于融合年报文本和新闻报道非结构化文本信息的指标体系,开展中小企业信用风险预测研究.采用递归特征消除方法筛选原始指标,并融入中小企业的年报文本复杂性、年报情感语调和新闻情绪极性等指标;基于贝叶斯优化的XGBoost (BO-XGBoost)等方法,比较在不同特征属性集上多种机器学习模型的信用风险预测性能;使用SHAP (SHapley additive explanations)可解释性方法对模型进行可视化的局部解释和全局解释.研究结果显示,加入了非结构化文本特征指标后模型的性能均有不同程度提升,即这些特征对中小企业信用风险具有良好的预测作用;BO-XGBoost相较Baseline预测性能更优,且非结构化文本特征重要性排序前列;使用SHAP瀑布图、散点图、依赖图解释了误判样例原因、特征对模型输出的影响极性及程度,以及非结构化文本特征与信用风险间的演化趋势,并基于委托-代理等理论进一步完善实证结论的理论支撑.This study focuses on the prediction of credit risk for small and medium-sized enterprises(SMEs)by leveraging a comprehensive indicator system that incorporates unstructured textual information such asannual report texts and news reports.The Recursive Feature Elimination(RFE)method is utilized to select original indicators and indicators such as annual report text complexity,annual report sentiment tendency and news sentiment polarity for SMEs are incorporated.By utilizing Bayesian optimization-based XGBoost(BO-XGBoost)and other methodologies,the predictive performance of various machine learning models is compared across different sets of feature attributes.Furthermore,the SHAP(SHapley Additive exPlanations)interpretability method is employed to provide visual and comprehensive explanations of the model at both the local and global levels.The research demonstrates that the inclusion of unstructured textual feature indicators significantly enhances the predictive performance of the models,thereby highlighting the valuable predictive role of these features in assessing credit risk for SMEs.BO-XGBoost outperforms the baseline prediction performance,and the unstructured textual features rank highly in terms of importance.The SHAP waterfall plot,scatter plot,and dependence plot are used to explain the reasons for misjudgment cases,the polarity and degree of features impact on model's output,the evolutionary trends between unstructured textual features and credit risk.The empirical conclusions are further theoretically supported by principal-agent theory and other theories.

关键词：信用风险预测年报文本新闻情绪中小企业 BO-XGBoost SHAP

分类号：TP18[自动化与计算机技术—控制理论与控制工程] F276.3[自动化与计算机技术—控制科学与工程] F275[经济管理—企业管理]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合非结构化文本信息的中小企业信用风险预测研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

融合非结构化文本信息的中小企业信用风险预测研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索