分类嵌入在车险索赔次数预测中的应用  

Application of Categorical Embedding in Predicting Automobile Insurance Claim Frequency

在线阅读下载全文

作  者:张连增 罗来娟 肖宇谷[2,3] 李浩男 ZHANG Lian-zeng;LUO Lai-juan;XIAO Yu-gu;LI Hao-nan

机构地区:[1]南开大学金融学院 [2]中国人民大学应用统计科学研究中心 [3]中国人民大学统计学院

出  处:《保险研究》2024年第9期26-43,共18页Insurance Studies

基  金:教育部人文社会科学重点研究基地重大项目“数字时代风险管理与精算模型研究”(项目号22JJD910003);天津市研究生科研创新项目(编号2022SKY038)的资助。

摘  要:自2020年9月起,车险综合改革对车险精准定价的要求日益严格,尤其在大数据时代背景下,数据特征复杂性的增加以及分类变量水平数的增多,使得广义线性模型等传统精算统计方法在处理此类数据时面临重大挑战。基于国内某保险公司的一个车团险数据集,本文应用四类机器学习方法和四种分类变量编码方式,使用不同评价指标比较了不同编码方式在车险索赔次数预测中的性能表现,并借助SHAP提高了机器学习模型的可解释性。实证结果显示:第一,不同的机器学习模型适用的分类变量编码方式可能不一样,需要根据机器学习模型的特点选择适配的分类变量编码方式;第二,相比于one-hot编码而言,分类嵌入方法能够显著降低模型的运行时间,提高运行效率;第三,根据SHAP输出的可解释性结果,车队交强险近三年平均赔付率是影响车辆索赔次数最重要的因素;第四,分类嵌入方法生成的嵌入向量对应于分类变量的不同水平,嵌入向量之间的距离可应用于投保主体的划分和风险评级。本文完善了分类嵌入方法在车险定价领域的应用,能够切实改善预测精度,提高运行效率,为推动车险定价的精准化和差异化做出贡献。Since September 2020,the comprehensive reform of auto insurance has put forward higher requirements for precise pri-cing,especially in the context of the big data era.The increasing complexity of data features and the large number of levels for cat-egorical variables impose significant challenges to traditional actuarial statistical methods such as Generalized Linear Models(GLMs).Using a dataset from a domestic insurance company on vehicle group insurance,this paper comprehensively considers the impact of four machine learning methods and four categorical variable encoding techniques on predicting automobile insurance claim frequency.Different evaluation metrics are employed to analyze the performance of categorical embedding in various scenari-os,and the interpretability of machine learning models is enhanced using SHAP.Empirical results are shown as follows:firstly,dif-ferent machine learning models may require different encoding methods for categorical variables,and it is necessary to select an appropriate encoding method based on the characteristics of the machine learning model.Moreover,compared to one-hot encoding,categorical embedding methods can significantly reduce model runtime and improve efficiency.Additionally,according to the inter-pretability results output by SHAP,the average cumulative ratio of claims in the past three years for group vehicles compulsory in-surance(AveCumpRatio)is identified as the most crucial factor influencing vehicle claim frequency.Lastly,the embedding vectors generated by categorical embedding methods correspond to different levels of the categorical variables,and the distance between the embedding vectors can be applied to the segmentation and risk rating of insurance policyholders.This article contributes to the refinement of the application of categorical embedding in the field of automobile insurance pricing,effectively improving prediction accuracy,enhancing operation efficiency,and making contributions to advancing the precision and differentiation of automobile in-su

关 键 词:车险定价 索赔次数 分类变量 嵌入 SHAP 

分 类 号:F842.634[经济管理—保险]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象