检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]重庆大学计算机科学与工程学院
出 处:《计算机研究与发展》2007年第4期651-659,共9页Journal of Computer Research and Development
基 金:国家自然科学基金项目(60403009);重庆市自然科学基金项目(2005BB2224)
摘 要:离群数据发现与分析是数据挖掘的重要组成部分,现有离群数据挖掘算法主要针对如何检测离群对象,缺乏对挖掘出的离群数据集进行解释与分析的有效方法.通过对离群数据来源及特性进行分析并结合粗糙集理论,定义了离群划分相似度的概念,提出了一种基于关键属性域子空间的离群数据聚类算法COKAS,该算法不仅揭示了离群数据子空间特性,进一步获取了扩展知识,而且有助于对整体数据集的理解.对两个多维数据集的实验结果表明,该算法具有良好的适应性及有效性.It is an important part of data mining to discover and analyze outlying observations. Outliers may contain crucial information, and so detecting them is much more significant than detecting general patterns in some applications which include, for instance, credit card fraud in finance, calling fraud in telecommunication, intrusion in network, disease diagnosis, etc. Existing outlier mining algorithms focus on detecting and identifying outliers, but studies of outliers include both mining outliers and analyzing why they are exceptional. The research on explaining and analyzing outliers slightly lags behind outlier mining technology now. It is inevitable that analyzing outliers to the full needs a great deal of knowledge from object task fields. However, some further discoveries of outliers may be obtained from studies of distributing characteristics of dataset in attribute space. By analyzing the origin and feature of outliers and using the theory of rough set, a concept of outlying partition similarity is defined and then an algorithm for clustering outliers based on key attribute subspace (COKAS) is proposed. The approach can provide the extended knowledge of identified outliers and improve the understanding of the whole data set. Experimental results of real multi-dimension data set show that this algorithm is scalable and efficient.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28