高维类别属性数据流离群点快速检测算法  被引量:21

A Fast Outlier Detection Algorithm for High Dimensional Categorical Data Streams

在线阅读下载全文

作  者:周晓云[1] 孙志挥[1] 张柏礼[1] 杨宜东[1] 

机构地区:[1]东南大学计算机科学与工程系,江苏南京210096

出  处:《软件学报》2007年第4期933-942,共10页Journal of Software

基  金:SupportedbytheNationalNaturalScienceFoundationofChinaunderGrantNo.70371015(国家自然科学基金);theDoctorScienceResearchFoundationoftheEducationMinistryofChinaunderGrantNo.20040286009(国家教育部高等学校博士学科点科研基金)

摘  要:提出类别属性数据流数据离群度量——加权频繁模式离群因子(weighted frequent pattern outlier factor,简称WFPOF),并在此基础上给出一种快速数据流离群点检测算法FODFP-Stream(fast outlier detection for high dimensional categorical data streams based on frequent pattern).该算法通过动态发现和维护频繁模式来计算离群度,能够有效地处理高维类别属性数据流,并可进一步扩展到数值属性和混合属性数据流.对仿真数据集和真实数据集的实验检测均验证该算法具有良好的适用性和有效性.This paper considers the problem of outlier detection in data stream, proposes a new metric called weighted frequent pattern outlier factor for categorical data streams, and presents a novel fast outlier detection algorithm named FODFP-Stream (fast outlier detection for high dimensional categorical data streams based on frequent pattern). FODFP-Stream computes the outlier measure through discovering and maintaining the frequent patterns dynamically, and can deal with the high dimensional categorical data streams effectively. FODFP-Stream can also be extended to resolve continuous attributes and mixed attributes data streams. The experimental results on synthetic and real data sets show the promising availabilities of the approaches.

关 键 词:数据流 离群点检测 频繁模式 高维 概念转移 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象