检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:裴正中 赵旭俊[1] PEI Zheng-zhong;ZHAO Xu-jun(College of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)
机构地区:[1]太原科技大学计算机科学与技术学院,山西太原030024
出 处:《计算机工程与设计》2024年第12期3622-3630,共9页Computer Engineering and Design
基 金:国家自然科学基金项目(61572343);山西省应用基础研究计划基金项目(20210302123223)。
摘 要:针对基于角度的离群检测方法普遍存在的计算成本高昂,且对超参数选择依赖性强的问题,提出一种基于角度的快速非参数方法HAOD。对数据集进行中心化处理并使用极坐标描;在此基础上,提出一种向量夹角计算函数的近似表示方法,采用该方法将向量夹角用一维顺序结构表示,提升检测效率;引入经验累积分布函数分别计算向量夹角及向量模长的尾部概率,将其作为单维度尾部得分;改进单维度尾部得分的聚合方式,对原始向量及其反转向量的尾部得分进行聚合,获取最终离群得分。在ODDS和UCI高维数据集上进行实验,其结果表明,HAOD在检测效率上优于5种对比方法,分别平均提高了28.74%至84.71%。Aiming at the high computational cost and strong dependence on hyperparameter selection of angle-based outlier detection methods,a fast angle-based nonparametric method HAOD was proposed.The data set was centralized and described using polar coordinates.On this basis,an approximate representation method of the vetorial angle calculation function was proposed,and the vetorial angle was represented by one-dimensional sequence structure to improve the detection efficiency.The empirical cumulative distribution function was introduced to calculate the tail probability of vetorial angle and vector modulus respectively,which were used as the single dimension tail score.The aggregation method of single-dimensional tail scores was improved,and the tail scores of original vector and reverse vector were aggregated to obtain the final outlier score.Experiments were conducted on ODDS and UCI high-dimensional data sets.Results show that HAOD is superior to the five comparison methods in detection efficiency with an average improvement of 28.74%to 84.71%,respectively.
关 键 词:高维数据 离群检测 基于角度 数据同构化 极坐标表示 经验累积分布函数 偏度
分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7