检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:梁斌梅[1,2]
机构地区:[1]广西大学数学与信息科学学院,南宁530004 [2]四川大学计算机学院,成都610065
出 处:《计算机工程与应用》2012年第9期101-103,107,共4页Computer Engineering and Applications
基 金:广西大学科研基金(No.XJZ100258)
摘 要:孤立数据的存在使数据挖掘结果不准确,甚至错误。现有的孤立点检测算法在通用性、有效性、用户友好性及处理高维大数据集的性能还不完善,为此,提出一种有效的全局孤立点检测方法,该方法进行凝聚层次聚类,根据聚类树和距离矩阵来可视化判断数据孤立程度,确定孤立点数目。从聚类树自顶向下,无监督地去除离群数据点。在多个数据集上的仿真实验结果表明,该方法能有效识别孤立程度最大的前n个全局孤立点,适用于不同形状的数据集,算法效率高,用户友好,且适用于大型高维数据集的孤立点检测。The existance of outlier always leads to inaccurate, even wrong results in data mining. The outlier detection algorithm now available should be improved including its versatility, effectiveness, user-friendliness, and the performance in processing high-dimen- sional and large databases. An effective and global outlier detection method is proposed. Agglomerative hierarchical clustering is per- formed, and the isolated degree of the data can be visually judged by the clustering tree and distance matrix, and the number of the outli- ers can be determined and the outliers are identified unsupervisedly from the top to down of the clustering tree. Experimental results show that the method can effectively detect the top-n global outliers, and applicable to datasets of various shapes. Experimental results show that the algorithm is efficient, user-friendly, and applicable to detect the outliers for high-dimensional and large databases.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.216.31.224