检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孟祥俊 陈进东 张健[1,2] MENG Xiangjun;CHEN Jindong;ZHANG Jian(School of Economics&Management,Beijing Information Science and Technology University Beijing 100192;Beijing International Science and Technology Cooperation Base for Intelligent Decision and Big Data Application,Beijing 100192)
机构地区:[1]北京信息科技大学经济管理学院,北京100192 [2]智能决策与大数据应用北京市国际科技合作基地,北京100192
出 处:《系统科学与数学》2024年第6期1608-1629,共22页Journal of Systems Science and Mathematical Sciences
基 金:国家重点研发计划课题(2019YFB1405303);北京市属高等学校优秀青年人才培育计划项目(BPHR202203233);国家自然科学基金面上项目(72174018)资助课题
摘 要:基于融合年报文本和新闻报道非结构化文本信息的指标体系,开展中小企业信用风险预测研究.采用递归特征消除方法筛选原始指标,并融入中小企业的年报文本复杂性、年报情感语调和新闻情绪极性等指标;基于贝叶斯优化的XGBoost (BO-XGBoost)等方法,比较在不同特征属性集上多种机器学习模型的信用风险预测性能;使用SHAP (SHapley additive explanations)可解释性方法对模型进行可视化的局部解释和全局解释.研究结果显示,加入了非结构化文本特征指标后模型的性能均有不同程度提升,即这些特征对中小企业信用风险具有良好的预测作用;BO-XGBoost相较Baseline预测性能更优,且非结构化文本特征重要性排序前列;使用SHAP瀑布图、散点图、依赖图解释了误判样例原因、特征对模型输出的影响极性及程度,以及非结构化文本特征与信用风险间的演化趋势,并基于委托-代理等理论进一步完善实证结论的理论支撑.This study focuses on the prediction of credit risk for small and medium-sized enterprises(SMEs)by leveraging a comprehensive indicator system that incorporates unstructured textual information such asannual report texts and news reports.The Recursive Feature Elimination(RFE)method is utilized to select original indicators and indicators such as annual report text complexity,annual report sentiment tendency and news sentiment polarity for SMEs are incorporated.By utilizing Bayesian optimization-based XGBoost(BO-XGBoost)and other methodologies,the predictive performance of various machine learning models is compared across different sets of feature attributes.Furthermore,the SHAP(SHapley Additive exPlanations)interpretability method is employed to provide visual and comprehensive explanations of the model at both the local and global levels.The research demonstrates that the inclusion of unstructured textual feature indicators significantly enhances the predictive performance of the models,thereby highlighting the valuable predictive role of these features in assessing credit risk for SMEs.BO-XGBoost outperforms the baseline prediction performance,and the unstructured textual features rank highly in terms of importance.The SHAP waterfall plot,scatter plot,and dependence plot are used to explain the reasons for misjudgment cases,the polarity and degree of features impact on model's output,the evolutionary trends between unstructured textual features and credit risk.The empirical conclusions are further theoretically supported by principal-agent theory and other theories.
关 键 词:信用风险预测 年报文本 新闻情绪 中小企业 BO-XGBoost SHAP
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] F276.3[自动化与计算机技术—控制科学与工程] F275[经济管理—企业管理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.21.186.117