基于Spark的分布式时序分类学习模型  被引量:1

Distributed sequential classification learning model based on Spark

在线阅读下载全文

作  者:申彦[1] 敬露艺 张士翔 SHEN Yan;JING Lu-yi;ZHANG Shi-xiang(Department of Information Management and Information System,Jiangsu University,Zhenjiang 212013,China;School of Management,Jiangsu University,Zhenjiang 212013,China)

机构地区:[1]江苏大学信息管理与信息系统系,江苏镇江212013 [2]江苏大学管理学院,江苏镇江212013

出  处:《计算机工程与设计》2023年第4期1042-1049,共8页Computer Engineering and Design

基  金:江苏省产学研合作基金项目(BY2021075);国家自然科学基金项目(61702229);教育部产学合作协同育人基金项目(201902128024);江苏省基础研究计划(自然科学基金)基金项目(BK20150531);全国统计科学研究基金项目(2016LY17)。

摘  要:LearnNSE算法保留了所有基分类器进行集成,基分类器权重调整较慢,对长期累积大数据的分类学习效率不高,且仅关注累积大数据,缺乏对短时间内突发产生大数据的关注,为此在所提PFLearnNSE-Pruned-Age算法基础上,研究一种基于Spark的分布式时序分类学习模型DSCLM-spark。实验结果表明,DSCLM-spark能够取得非常接近、在许多场景下甚至优于LearnNSE的准确率,进一步提高集成分类学习的效率,兼顾短时产生及长时间累积的大数据,适用于对分类挖掘实时性要求较高的场合。To overcome the disadvantages of LearnNSE algorithm,such as it keeps all the base-classifiers to finish the ensemble learning and its execution efficiency is not high for the long-term accumulated big data because its adjustment of weights for all the base-classifiers is relatively slow.In addition,the LearnNSE just focuses on the accumulated big data,it lacks of the focus on the big data generated in a short period of time suddenly,on the basis of the PFLearnNSE-Pruned-Age algorithm presented,a distributed sequential classification learning model based on Spark,named DSCLM-spark for short,was studied and presented.Experiment results show that DSCLM-spark model can achieve the very close classification accuracy compared to LearnNSE,even better in lots of scenes and it can further improve the executive efficiency of the ensemble classification learning by giving consideration to the big data generated both in short term and long term.This model is suitable for the classification learning’s situations with high real time requirement.

关 键 词:分类算法 大数据挖掘 集成学习 增量学习 非稳定环境 分布式系统 计算机集群 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象