基于集成学习的烟草异常数据挖掘研究与应用  被引量:3

Study and Application of Tobacco Anomaly Data Mining Based on Ensemble Learning

在线阅读下载全文

作  者:李天举 谢志峰[1] 张侃弘 陶亦筠 范杰[2] 汤臻 LI Tian-ju;XIE Zhi-feng;ZHANG Kan-hong;TAO Yi-jun;FAN Jie;TANG Zhen(Shanghai University,Shanghai 200072,China;Shanghai Tobacco Group Co.,Ltd.,Shanghai 200082,China;Shanghai Tobacco Monopoly Administration,Shanghai 200120,China)

机构地区:[1]上海大学,上海200072 [2]上海烟草集团有限责任公司,上海200082 [3]上海市烟草专卖局,上海200120

出  处:《计算机技术与发展》2020年第11期128-135,共8页Computer Technology and Development

基  金:国家自然科学基金(61303093);上海市自然科学基金(19ZR1419100)。

摘  要:为了推动上海市烟草专卖市场监管方式转型,实现市场监管水平的有效提升,通过引入异常数据挖掘方法,从而强化市场异动预测和分析。结合目前机器学习前沿理论的研究,提出了基于多模型Stacking集成学习的烟草异常数据挖掘模型,运用Stacking集成学习的方式,充分发挥各个算法模型的优势。数据集使用的是2016年1月到2019年4月的烟草专卖数据,通过数据预处理等方式将数据指标化,并使用数据增强等手段一定程度上缓解了数据不平衡的问题。使用该数据对模型进行了验证分析,其结果很好地证明了Stacking模型中单个机器学习算法的学习能力越强,关联程度越低,集成后的模型预测结果越好。最后通过实证稽查环节,充分验证了模型的有效性,经过全市实证后,市场检查对零售户的问题查实率能从现有的5%左右提升至15%以上。In order to promote the transformation of the Shanghai tobacco monopoly market supervision method and achieve an effective improvement in the level of market supervision,the introduction of abnormal data mining methods has strengthened the prediction and analysis of market movements.Combined with the current research on cutting-edge theories of machine learning,a tobacco anomaly data mining model based on multi-model Stacking ensemble learning is proposed,and the advantages of each algorithm model are brought into full play by using Stacking ensemble learning.The data set uses tobacco monopoly data from January 2016 to April 2019.The data is indexed through data preprocessing and other methods,and data enhancement is used to alleviate the problem of data imbalance to some extent.The model is verified and analyzed by these data.The results well prove that the stronger the learning ability of a single machine learning algorithm in the Stacking model,the lower the degree of association,and the better the prediction result of the integrated model.Finally,the effectiveness of the model is fully verified through the empirical inspection link.After the city’s empirical verification,the market inspection of the retailer’s problem verification rate can be increased from the existing 5%to more than 15%.

关 键 词:异常数据挖掘 集成学习 数据预处理 数据增强 Stacking模型 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象