高维空间中的离群点发现  被引量:44

Finding Outliers in High-Dimensional Space

在线阅读下载全文

作  者:魏藜[1] 宫学庆[1] 钱卫宁[1] 周傲英[1] 

机构地区:[1]复旦大学计算机科学与工程系,上海200433

出  处:《软件学报》2002年第2期280-290,共11页Journal of Software

基  金:国家自然科学基金资助项目(60003016;60003008);国家重点基础研究发展规划973资助项目(G1998030404)~~

摘  要:在许多KDD(knowledge discovery in databases)应用中,如电子商务中的欺诈行为监测,例外情况或离群点的发现比常规知识的发现更有意义.现有的离群点发现大多是针对数值属性的,而且这些方法只能发现离群点不能对其含义进行解释.提出了一种基于超图模型的离群点(outlier)定义,这一定义既体现了“局部”的概念能很好地解释离群点的含义.同时给出了HOT(hypergraph-based outlier test)算法,通过计算每个点的支持度、隶属度和规模偏差来检测离群点.该算法既能够处理数值属性,又能够处理类别属性.分析表明,该算法能有效地发现高维空间数据中的离群点.For many KDD (knowledge discovery in databases) applications, such as fraud detection in E-commerce, it is more interesting to find the exceptional instances or the outliers than to find the common knowledge. Most existing work in outlier detection deals with data with numerical attributes. And these methods give no explanation to the outliers after finding them. In this paper, a hypergraph-based outlier definition is presented, which considers the locality of the data and can give good explanation to the outliers, and it also gives an algorithm called HOT (hypergraph-based outlier test) to find outliers by counting three measurements, the support, belongingness and deviation of size, for each vertex in the hypergraph. This algorithm can manage both numerical attributes and categorical attributes. Analysis shows that this approach can find the outliers in high-dimensional space effectively.

关 键 词:数据挖掘 离群点 超图模型 聚类 知识发现 高维空间数据库 

分 类 号:TP392[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象