高斯混合模型下的相关子空间与离群数据挖掘  被引量:5

Relative Subspaces and Outlier Mining in Gaussian Mixture Model

在线阅读下载全文

作  者:樊盼盼 张继福[1] FAN Pan-pan;ZHANG Ji-fu(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)

机构地区:[1]太原科技大学计算机科学与技术学院,太原030024

出  处:《小型微型计算机系统》2018年第11期2491-2496,共6页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61572343)资助

摘  要:相关子空间是一种与离群数据有关的属性集维集合,可有效地降低"维灾"的影响.本文利用高斯混合模型重新定义了相关子空间,并且给出了一种相关子空间的离群挖掘算法.该算法首先根据k近邻算法,确定数据集中各数据对象的局部数据集,并依据属性值的稀疏度生成全局的稀疏度矩阵,稀疏度矩阵有效地体现出数据的稀疏性和稠密性;其次,利用高斯混合模型和稀疏度矩阵,识别数据对象的相关子空间和不相关子空间,避免了不相关子空间对度量离群数据的影响;然后,在相关子空间中,利用数据对象每个维度的稀疏度和属性权值,计算数据对象的离群值,并选取离群值较大的若干个对象作为离群数据;最后采用人工和UCI数据集,实验验证了该算法的有效性.The relative subspace,an attribute set related to outlier,can reduce the impact of "dimensional disaster" effectively.In this paper,the relevant subspace is redefined by Gaussian mixture model and an outlier mining algorithm is presented in the relative subspace.First,each data object′s local dataset is calculated from K-Nearest Neighbors algorithm.Sparse degree matrix,which reflects sparse and dense of data,is generated using the data object′s attribute sparse degree.Second,the relative subspace and unrelated subspace of the data object are identified by Gaussian mixture model and sparse degree matrix,which can avoid the influence of the irrelevant subspace on the measurement of the outlier data.Then,the outlier′s score is calculated in the relevant subspace by the sparseness of the data object′s each dimension and the weights of attribute.And our algorithm can identify outliers as data objects ranked on the first top-N with high outlier′s score.In the end,we conduct extensive experients to validate the correctness and the effectiveness of our algorithm on the synthetic and the UCI data sets.

关 键 词:离群挖掘 相关子空间 高斯混合模型 稀疏度 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象