机构地区:[1]四川大学出国留学人员培训部,四川成都610064 [2]四川大学商学院,四川成都610064 [3]华中师范大学信息管理学院,湖北武汉430079 [4]澳门科技大学商学院,中国澳门特别行政区999078 [5]成都信息工程大学管理学院,四川成都610225
出 处:《中国管理科学》2022年第12期211-221,共11页Chinese Journal of Management Science
基 金:国家自然科学基金资助面上项目(72171160,71974020);四川省杰出青年基金资助项目(2020JDJQ0021);四川省“天府万人计划项目”(0082204151153);四川省软科学研究计划项目(2020JDR0120);四川大学国家领军人才培育基金资助项目(sksyl2021-03)。
摘 要:针对现实中信用评估存在的问题,本研究将元代价敏感学习、半监督学习和异构集成等技术结合,提出了基于Metacost的客户信用评估半监督异构集成模型(Metacost based semi-supervised heterogeneous ensemble model,Meta-Semi-HE)。该模型主要包括三个阶段:1)用Metacost方法修改初始有标签训练集得到Lm;2)在Lm上通过AdaBoost方法训练N个异构分类器hi(i=1,…,N),用伴随分类器组合Hi选择性标记无标签数据集的样本,并将其添加到Lm中,用新的Lm重新训练N个异构分类器。重复这一步骤,不断提高分类器性能,直至满足终止条件;3)用最终的N个异构分类器对测试集样本分类。在6个客户信用评估数据集上进行实证分析,结果表明,与已有的3种半监督集成模型和2种监督式集成模型相比,本研究提出的模型具有更好的客户信用评估性能。With the popularization of the credit business,effective risk aversion is one of the main means to maintain stable profits in the financial industry,and credit risk is one of the most common and important risk types in the financial industry.Therefore,accurate credit scoring of customers is very important.However,the class distribution of customer data used for credit-scoring models is often highly imbalanced,which means that there are significantly more customers with good credit as compared to customers with bad credit,and only a few customers who have successfully obtained loans can be labeled according to their future behavior,many customers who have applied for loans but failed to obtain them cannot be labeled.These characteristics bring great challenges to the establishment of scientific and accurate customer credit-scoring models,and existing researches cannot solve the above problems well.To make up for the lack of existing researches,meta cost-sensitive learning,semi-supervised learning,and heterogeneous ensemble learning are combined,and a Metacost based semi-supervised heterogeneous ensemble model(Meta-Semi-HE) is proposed for customer credit scoring.This model includes the following three stages:1) Metacost is used to modify the initial labeled training set to obtain Lm;2) N heterogeneous classifiers hi(i=1,…,N) are trained on Lmby AdaBoost,concomitant ensemble Hiis used to selectively mark samples of unlabeled data set,and adds them into Lm,N heterogeneous classifiers are retrained with the new Lm.Repeat this step to improve the performance of the member classifiers until the termination condition is satisfied;3) the final trained classifiers are used to classify samples of the test set.The empirical analysis is conducted in six customer credit-scoring datasets,and the results show that the Meta-Semi-HE has better customer credit-scoring performance than the other five models in the evaluation criteria of AUC,f,Type I accuracy,and Type II accuracy.A new way of thinking for banks’ customer credit-
关 键 词:客户信用评估 类别分布不平衡 代价敏感学习 半监督 异构集成
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] F270[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...