检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杜旭升 于炯[1,2] 叶乐乐 陈嘉颖 DU Xusheng;YU Jiong;YE Lele;CHEN Jiaying(School of Software,Xinjiang University,Urumqi Xinjiang 830008,China;College of Information Science and Engineering,Xinjiang University,Urumqi Xinjiang 830046,China;School of Software Engineering,Xi’an Jiaotong University,Xi’an Shannxi 710049,China)
机构地区:[1]新疆大学软件学院,乌鲁木齐830008 [2]新疆大学信息科学与工程学院,乌鲁木齐830046 [3]西安交通大学软件学院,西安710049
出 处:《计算机应用》2020年第5期1322-1328,共7页journal of Computer Applications
基 金:国家自然科学基金资助项目(61862060,61462079,61562086,61562078)。
摘 要:离群点检测算法在网络入侵检测、医疗辅助诊断等领域具有十分广泛的应用。针对LDOF、CBOF及LOF算法在大规模数据集和高维数据集的检测过程中存在的执行时间长及检测率较低的问题,提出了基于图上随机游走(BGRW)的离群点检测算法。首先初始化迭代次数、阻尼因子以及数据集中每个对象的离群值;其次根据对象之间的欧氏距离推导出漫步者在各对象之间的转移概率;然后通过迭代计算得到数据集中每个对象的离群值;最后将数据集中离群值最高的对象判定为离群点并输出。在UCI真实数据集与复杂分布的合成数据集上进行实验,将BGRW算法与LDOF、CBOF和LOF算法在执行时间、检测率和误报率指标上进行对比。实验结果表明,BGRW算法能够有效降低执行时间并在检测率及误报率指标上优于对比算法。Outlier detection algorithms are widely used in various fields such as network intrusion detection,and medical aided diagnosis.Local Distance-Based Outlier Factor(LDOF),Cohesiveness-Based Outlier Factor(CBOF)and Local Outlier Factor(LOF)algorithms are classic algorithms for outlier detection with long execution time and low detection rate on large-scale datasets and high dimensional datasets.Aiming at these problems,an outlier detection algorithm Based on Graph Random Walk(BGRW)was proposed.Firstly,the iterations,damping factor and outlier degree for every object in the dataset were initialized.Then,the transition probability of the rambler between objects was deduced based on the Euclidean distance between the objects.And the outlier degree of every object in the dataset was calculated by iteration.Finally,the objects with highest outlier degree were output as outliers.On UCI(University of California,Irvine)real datasets and synthetic datasets with complex distribution,comparison between BGRW and LDOF,CBOF,LOF algorithms about detection rate,execution time and false positive rate were carried out.The experimental results show that BGRW is able to decrease execution time and false positive rate,and has higher detection rate.
关 键 词:数据挖掘 离群点检测 马尔可夫链 随机游走 LDOF CBOF LOF
分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.149.4.109