基于层次聚类识别数据集前n个全局孤立点  被引量:5

Detection of top-n global outliers in datasets based on hierarchical clustering

在线阅读下载全文

作  者:梁斌梅[1,2] 

机构地区:[1]广西大学数学与信息科学学院,南宁530004 [2]四川大学计算机学院,成都610065

出  处:《计算机工程与应用》2012年第9期101-103,107,共4页Computer Engineering and Applications

基  金:广西大学科研基金(No.XJZ100258)

摘  要:孤立数据的存在使数据挖掘结果不准确,甚至错误。现有的孤立点检测算法在通用性、有效性、用户友好性及处理高维大数据集的性能还不完善,为此,提出一种有效的全局孤立点检测方法,该方法进行凝聚层次聚类,根据聚类树和距离矩阵来可视化判断数据孤立程度,确定孤立点数目。从聚类树自顶向下,无监督地去除离群数据点。在多个数据集上的仿真实验结果表明,该方法能有效识别孤立程度最大的前n个全局孤立点,适用于不同形状的数据集,算法效率高,用户友好,且适用于大型高维数据集的孤立点检测。The existance of outlier always leads to inaccurate, even wrong results in data mining. The outlier detection algorithm now available should be improved including its versatility, effectiveness, user-friendliness, and the performance in processing high-dimen- sional and large databases. An effective and global outlier detection method is proposed. Agglomerative hierarchical clustering is per- formed, and the isolated degree of the data can be visually judged by the clustering tree and distance matrix, and the number of the outli- ers can be determined and the outliers are identified unsupervisedly from the top to down of the clustering tree. Experimental results show that the method can effectively detect the top-n global outliers, and applicable to datasets of various shapes. Experimental results show that the algorithm is efficient, user-friendly, and applicable to detect the outliers for high-dimensional and large databases.

关 键 词:孤立点检测 层次聚类 数据挖掘 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象