基于同构化角度的离群检测方法  

Outlier detection method based on homogeneous angle

在线阅读下载全文

作  者:裴正中 赵旭俊[1] PEI Zheng-zhong;ZHAO Xu-jun(College of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)

机构地区:[1]太原科技大学计算机科学与技术学院,山西太原030024

出  处:《计算机工程与设计》2024年第12期3622-3630,共9页Computer Engineering and Design

基  金:国家自然科学基金项目(61572343);山西省应用基础研究计划基金项目(20210302123223)。

摘  要:针对基于角度的离群检测方法普遍存在的计算成本高昂,且对超参数选择依赖性强的问题,提出一种基于角度的快速非参数方法HAOD。对数据集进行中心化处理并使用极坐标描;在此基础上,提出一种向量夹角计算函数的近似表示方法,采用该方法将向量夹角用一维顺序结构表示,提升检测效率;引入经验累积分布函数分别计算向量夹角及向量模长的尾部概率,将其作为单维度尾部得分;改进单维度尾部得分的聚合方式,对原始向量及其反转向量的尾部得分进行聚合,获取最终离群得分。在ODDS和UCI高维数据集上进行实验,其结果表明,HAOD在检测效率上优于5种对比方法,分别平均提高了28.74%至84.71%。Aiming at the high computational cost and strong dependence on hyperparameter selection of angle-based outlier detection methods,a fast angle-based nonparametric method HAOD was proposed.The data set was centralized and described using polar coordinates.On this basis,an approximate representation method of the vetorial angle calculation function was proposed,and the vetorial angle was represented by one-dimensional sequence structure to improve the detection efficiency.The empirical cumulative distribution function was introduced to calculate the tail probability of vetorial angle and vector modulus respectively,which were used as the single dimension tail score.The aggregation method of single-dimensional tail scores was improved,and the tail scores of original vector and reverse vector were aggregated to obtain the final outlier score.Experiments were conducted on ODDS and UCI high-dimensional data sets.Results show that HAOD is superior to the five comparison methods in detection efficiency with an average improvement of 28.74%to 84.71%,respectively.

关 键 词:高维数据 离群检测 基于角度 数据同构化 极坐标表示 经验累积分布函数 偏度 

分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象