检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陆莉莉[1] 张永潘 谈海宇 季一木[2] LU Lili;ZHANG Yongpan;TAN Haiyu;JI Yimu(Institute of Computer & Software, Nanjing College of Information Technology, Nanjing 210023, China;School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China)
机构地区:[1]南京信息职业技术学院计算机与软件学院,南京210023 [2]南京邮电大学计算机学院,南京210023
出 处:《计算机科学与探索》2016年第12期1683-1692,共10页Journal of Frontiers of Computer Science and Technology
基 金:江苏省自然科学基金青年基金No.BK20130876;南京信息职业技术学院科研基金No.YK20140402~~
摘 要:随着大数据应用研究的不断深入和分布式机器学习中流计算框架的涌现,针对数据流中概念漂移问题的研究是面向大数据挖掘领域的研究热点之一。现有的针对概念漂移的研究成果主要还是依赖于数据结构和算法优化,通过计算资源有限的独立计算机完成概念漂移的检测。为此,提出一种面向大数据的基于Storm的抵抗概念漂移的分类挖掘算法S-CVFDT(Storm-concept very fast decision tree)及系统。该系统采用并行化窗口和S-CVFDT算法,利用并行化窗口机制检测数据流中的突变型概念漂移,从而自适应地改变并行窗口大小,并通过S-CVFDT算法不断更新渐进性概念漂移时的模型。分析与实验结果表明,该算法可以快速有效地检测到突变型概念漂移,降低系统因为突变型概念漂移造成的资源浪费,且模型建立效率、分类精度得到提高。With the deepening research of the application on big data and the emergence of more and more distributed computing framework, the research on concept drift in data stream becomes one of the research highlights in data mining for big data.The existing research on concept drift mainly depends on the data structure and algorithm optimization,the calculation mainly depends on the sole computer and limited resources to complete concept drift detection. Thus,this paper proposes a classification mining algorithm and system for big data based on Storm to resist concept drift.The S-CVFDT (Storm-concept very fast decision tree) algorithm system uses the parallel window mechanism to detect mutant concept drift in data stream and adaptively changes the parallel window size so as to update S-CVFDT algorithm model. The experimental analysis and results show that the algorithm can effectively detect mutant concept drift and lower the system resources waste. Not only the modeling is more efficient, but also the classification accuracy is improved.
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.31