基于CART决策树的分布式数据离群点检测算法  

Distributed data outlier detection algorithm based on CART decision tree

在线阅读下载全文

作  者:朱华[1] 乔勇进[2,3] 董国钢[1] ZHU Hua;QIAO Yongjin;DONG Guogang(School of Computer Science and Technology,Wuhan University of Bioengineering,Wuhan 430415,China;China Agricultural University,Beijing 100091,China;Shanghai Academy of Agricultural Sciences,Shanghai 201403,China)

机构地区:[1]武汉生物工程学院计算机科学与技术学院,湖北武汉430415 [2]中国农业大学,北京100091 [3]上海市农业科学院,上海201403

出  处:《现代电子技术》2024年第16期157-162,共6页Modern Electronics Technique

基  金:国家自然科学青年基金项目:基于UV-B信号转导途径的桃果实芳樟醇合成转录调控机制研究(32102451)。

摘  要:在分布式计算环境中,离群点通常表示数据中的异常情况,例如故障、欺诈、攻击等。通过检测分布式数据的离群点,可以对这些异常数据进行集中处理,保护系统和数据的安全。而进行离群点检测时,不仅要考虑数据的规模和复杂性,还要在分布式环境下高效地发现离群点。因此,提出一种基于CART决策树的分布式数据离群点检测算法。在构建CART决策树时,使用类间中心距离作为分裂准则,根据分离类别对训练数据进行分类,从而确定数据的类型。在上述基础上,考虑到离群点的分布模式与其周围数据对象不同,使用空间局部偏离因子(SLDF)对空间内各个数据对象之间的离群程度展开度量,同时在高维空间内展开网格划分,引入SLDF算法检测剩余离群点集,最终实现分布式数据离群点检测。实验结果表明,所提方法的离散点检测错误率在0.010以内,可以更加精准地实现分布式数据离群点检测,具有良好的检测性能。In distributed computing environments,outliers often represent abnormal situations in data,such as failures,fraud,attacks,etc.By detecting outliers in distributed data,these abnormal data can be processed centrally to protect the security of the system and data.When conducting outlier detection,it is not only necessary to consider the size and complexity of the data,but also to efficiently discover outliers in a distributed environment.Therefore,a distributed data outlier detection algorithm based on CART decision tree is proposed.When constructing the CART decision tree,the inter class center distance is used as the splitting criterion to classify the training data according to the separated categories,so as to determine the type of data.On the basis of the above,considering that the distribution pattern of outliers is different from their surrounding data objects,a spatial local deviation factor(SLDF)is used to measure the degree of outliers between various data objects in space.The grid partitioning is carried out in high-dimensional space,and the SLDF algorithm is introduced to detect the remaining outlier set,ultimately realizing the outlier detection of the distributed data.The experimental results show that the error rate of the proposed method for the outlier detection is within 0.010,which can realize more accurate outlier detection of the distributed data,and has good detection performance.

关 键 词:CART决策树 分布式数据 离群点检测 类间距离 数据分类 空间局部偏离因子 

分 类 号:TN919-34[电子电信—通信与信息系统] TP391[电子电信—信息与通信工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象