检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张忠平[1,2,3] 张玉停 刘伟雄 邓禹 ZHANG Zhongping;ZHANG Yuting;LIU Weixiong;DENG Yu(College of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China;The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province,Yanshan University,Qinhuangdao 066004,China;The Key Laboratory of Software Engineering of Hebei Province,Qinhuangdao 066004,China)
机构地区:[1]燕山大学信息科学与工程学院,河北秦皇岛066004 [2]河北省计算机虚拟技术与系统集成重点实验室,河北秦皇岛066004 [3]河北省软件工程重点实验室,河北秦皇岛066004
出 处:《计算机集成制造系统》2022年第12期3869-3878,共10页Computer Integrated Manufacturing Systems
基 金:河北省创新能力提升计划资助项目(20557640D)。
摘 要:离群点检测是数据挖掘研究的一个重要领域。在传统基于近邻的离群点检测方法中,k近邻关系被广泛使用。然而,随着数据分布的多样化和数据维度的增加,基于k近邻关系算法检测离群点的过程中易受不同类簇影响而检测效果不佳。针对以上问题,首先通过引入近邻树代替k近邻关系生成新的邻域集合,提出质心投影的概念用来刻画数据对象与其邻居点的分布特征,其次在数据对象邻居点逐渐增多的过程中,离群点和内部点质心投影变化不同,采用质心投影波动来衡量每个数据对象的离群程度,最终提出了基于质心投影波动的离群点检测算法。通过在人工数据集和真实数据集下进行的实验表明,该算法能有效且较为全面地检测离群点。Outlier detection is an important field of data mining research.In the traditional outlier detection method based on nearest neighbor,the k-nearest neighbor relationship is widely used.However,with the diversification of data distribution and the increase of data dimensions,the process of detecting outliers based on the k-nearest neighbor relationship algorithm is easily affected by different clusters and the detection effect is not satisfactory.To solve the above problems,a new neighborhood set was generated by introducing the nearest neighbor tree instead of the k-nearest neighbor relationship,and the concept of centroid projection was proposed to describe the distribution characteristics of the data object and its neighbors.As the neighbor points of the data object gradually increase,the centroid projections of outliers and internal points were different,and the centroid projection fluctuation was proposed to measure the degree of outlier of each data object.An outlier detection algorithm based on the fluctuation of centroid projection was proposed.Experiments on artificial data sets and real data sets showed that the proposed algorithm could effectively and comprehensively detect outliers.
关 键 词:数据挖掘 离群点检测 K近邻 近邻树 质心投影波动
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.106