基于流数据的在线学习算法  

Online Learning Algorithm Based on Streaming Data

作  者:张丽丽 

机构地区:[1]青岛大学数学与统计学院,山东 青岛

出  处:《应用数学进展》2025年第2期463-473,共11页Advances in Applied Mathematics

摘  要:文章提出了一种针对流数据概念漂移现象的在线学习算法。为了提高预测的速度与精度,本文提出了多步预测回归集成模型,并详细描述了结合聚类算法的样本重抽样过程,以应对流数据的高维和大规模问题。通过将重抽样后的样本引入基于滑动窗口的在线自适应框架,结合多步预测回归模型组成本文的在线学习算法,该算法能够及时识别和处理概念漂移现象。此外,还提出了概念漂移的统计理论依据,确保了算法的准确性。针对路口车流量与网站浏览量数据,本文提出了概念漂移的类型,并针对突变漂移提出布尔因子,有效减少了突变漂移的不良影响。在实例评估中,本文方法在准确度和稳定性上均表现良好。This paper proposes an online learning algorithm for the concept drift phenomenon of streaming data. In order to improve the speed and accuracy of prediction, this paper proposes a multi-step prediction regression ensemble model and describes in detail the sample resampling process combined with the clustering algorithm to cope with the high-dimensional and large-scale problems of streaming data. By introducing the resampled samples into an online adaptive framework based on sliding windows and combining them with the multi-step prediction regression model to form the online learning algorithm of this paper, the algorithm can timely identify and handle the concept drift phenomenon. In addition, the paper also proposes a statistical theoretical basis for concept drift to ensure the accuracy of the algorithm. For the intersection traffic flow and website pageview data, this paper proposes the type of concept drift and proposes a Boolean factor for sudden drift, which effectively reduces the adverse effects of sudden drift. In the example evaluation, the method in this paper performs well in both accuracy and stability.

关 键 词:流数据 概念漂移 在线学习 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象