Improved Data Stream Clustering Method: Incorporating KD-Tree for Typicality and Eccentricity-Based Approach  

在线阅读下载全文

作  者:Dayu Xu Jiaming Lu Xuyao Zhang Hongtao Zhang 

机构地区:[1]College of Mathematics and Computer Science,Zhejiang A&F University,Hangzhou,311300,China [2]College of Economics and Management,Zhejiang A&F University,Hangzhou,311300,China

出  处:《Computers, Materials & Continua》2024年第2期2557-2573,共17页计算机、材料和连续体(英文)

基  金:This research was funded by the National Natural Science Foundation of China(Grant No.72001190);by the Ministry of Education’s Humanities and Social Science Project via the China Ministry of Education(Grant No.20YJC630173);by Zhejiang A&F University(Grant No.2022LFR062).

摘  要:Data stream clustering is integral to contemporary big data applications.However,addressing the ongoing influx of data streams efficiently and accurately remains a primary challenge in current research.This paper aims to elevate the efficiency and precision of data stream clustering,leveraging the TEDA(Typicality and Eccentricity Data Analysis)algorithm as a foundation,we introduce improvements by integrating a nearest neighbor search algorithm to enhance both the efficiency and accuracy of the algorithm.The original TEDA algorithm,grounded in the concept of“Typicality and Eccentricity Data Analytics”,represents an evolving and recursive method that requires no prior knowledge.While the algorithm autonomously creates and merges clusters as new data arrives,its efficiency is significantly hindered by the need to traverse all existing clusters upon the arrival of further data.This work presents the NS-TEDA(Neighbor Search Based Typicality and Eccentricity Data Analysis)algorithm by incorporating a KD-Tree(K-Dimensional Tree)algorithm integrated with the Scapegoat Tree.Upon arrival,this ensures that new data points interact solely with clusters in very close proximity.This significantly enhances algorithm efficiency while preventing a single data point from joining too many clusters and mitigating the merging of clusters with high overlap to some extent.We apply the NS-TEDA algorithm to several well-known datasets,comparing its performance with other data stream clustering algorithms and the original TEDA algorithm.The results demonstrate that the proposed algorithm achieves higher accuracy,and its runtime exhibits almost linear dependence on the volume of data,making it more suitable for large-scale data stream analysis research.

关 键 词:Data stream clustering TEDA KD-TREE scapegoat tree 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象