基于扩散模型生成数据重构的客户流失预测

Customer Churn Prediction Based on Generation Data Reconstruction Using Diffusion Model

作　　者：杨斌王正阳[2] 程梓航赵慧英王鑫[1] 管宇[2,3] 程新洲 Yang Bin;Wang Zhengyang;Cheng Zihang;Zhao Huiying;Wang Xin;Guan Yu;Cheng Xinzhou(China Unicom Research Institute,Beijing 100048;School of Artificial Intelligence,Beijing University of Posts and Telecommunications,Beijing 100876;Yunnan Key Laboratory of Software Engineering(Yunnan University),Kunming 650504)

机构地区：[1]中国联通研究院,北京100048 [2]北京邮电大学人工智能学院,北京100876 [3]云南省软件工程重点实验室(云南大学),昆明650504

出　　处：《计算机研究与发展》2024年第2期324-337,共14页Journal of Computer Research and Development

基　　金：云南省软件工程重点实验室开放基金项目(2023SE202)。

摘　　要：在数据挖掘领域普遍存在数据不平衡影响到模型预测精度的问题,同时还存在未考虑用户隐私保护的问题.生成伪造数据是一种重要的解决方法,但在以结构化数据为主的场景中,由于存在数据特征维度多且不相关等特点,生成高质量的数据存在挑战.考虑到扩散模型在图像生成等任务中被成功应用,以客户流失预测为典型应用场景,尝试将扩散模型应用到客户流失预测任务中.针对该场景数据中的数值型特征和类别型特征,通过高斯扩散模型和多项式扩散模型获得生成数据,并对模型预测效果和数据隐私保护能力进行研究和分析.在多个领域的客户流失数据上进行了大量实验,探索应用生成数据对真实数据融合重构的可能性.实验结果表明基于扩散模型可生成高质量数据,且对多种预测方法均有一定提升,可实现缓解数据不平衡问题.同时,基于扩散模型生成的数据分布更接近真实数据,具有应用于用户隐私保护的潜在价值.In the field of data mining,the issue of data imbalance impacting model prediction accuracy is widespread,and also the issue of user privacy protection is neglected.Fake dataset generation has come to light as a crucial remedy for these problems.However,because of the traits of high-dimensional and irrelevant features,it is difficult to generate high-quality data in circumstances where structured data predominate.Considering the successful applications of the diffusion model in image generation task,we aim to apply the diffusion model for the task of customer churn prediction,which is a typical scenario in data mining.we utilize the Gaussian diffusion model and polynomial diffusion model to generate data for numerical and categorical features in customer churn data.Research and analysis have been conducted on the predictive performance and data privacy protection capabilities of our model.We conduct extensive experiments on customer churn data from multiple domains to explore the potential of fusing synthetic dataset and real dataset for data reconstruction.The results demonstrate that the diffusion model can generate high-quality data and improve the performance of various prediction methods,which can help alleviate the issue of data imbalance.Additionally,the data produced by the diffusion model exhibit a distribution that is quite similar to the original dataset,which may be useful for protecting user privacy.

关键词：客户流失扩散模型用户隐私数据生成类别特征

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于扩散模型生成数据重构的客户流失预测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于扩散模型生成数据重构的客户流失预测

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索