基于相关子空间的多源离群检测算法  被引量:1

Multi-source Outlier Detection Algorithm Based on Relevant Subspace

在线阅读下载全文

作  者:马洋[1] 赵旭俊[1] MAYang;ZHAO Xujun(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)

机构地区:[1]太原科技大学计算机科学与技术学院,太原030024

出  处:《计算机工程与应用》2021年第17期88-95,共8页Computer Engineering and Applications

基  金:国家自然科学基金(61572343,U1931209);山西省应用基础研究计划项目(201901D111257,201901D211303);山西省重点研发计划项目(201903D121116);太原科技大学科研启动基金(20192013)。

摘  要:传统的离群检测方法多数源于单个数据集或多数据源融合后的单一数据集,其检测结果忽略了多源数据之间的关联知识和单数据源中的关键信息。为了检测多源数据之间的离群关联知识,提出一种基于相关子空间的多源离群检测算法RSMOD。结合k近邻集和反向近邻集的双向影响,给出面向多源数据的对象影响空间,提高了离群对象度量的准确性;在影响空间基础上,提出面向多源数据的稀疏因子及稀疏差异因子,有效地刻画了数据对象在多源数据中的稀疏程度,重新定义了相关子空间的度量,使其能适用于多源数据集,并给出基于相关子空间的离群检测算法;采用人工合成数据集和真实的美国人口普查数据集,实验验证了RSMOD算法的性能并分析了源于多数据集的离群关联知识。Most of the traditional outlier detection methods come from a dataset or a single dataset after multi-source fusion.The detection results ignore the association knowledge among multi-source data sets and some key information in a single data source.To detect the related outlier knowledge among multi-source datasets,this paper proposes a Multi-source Outlier Detection algorithm based on Relevant Subspace(RSMOD).Firstly,this research proposes an object influence space for multi-source data,which uses k-nearest-neighbor-set and reverse-nearest-neighbor-set to improve the accuracy of object deviation measurement.Secondly,this paper presents a sparse factor and a sparse difference factor for multisource data,which can effectively describe the density of data objects in multi-source dataset.Thirdly,after redefining the measurement of relevant subspace,an outlier detection algorithm based on relevant subspace is given.The algorithm can be applied to multi-source datasets.Finally,the performance of RSMOD algorithm is verified by using synthetic datasets and real US census datasets.This paper also analyzes the above experimental results to obtain the outlier association knowledge from multiple datasets.

关 键 词:离群检测 多源数据 子空间 数据挖掘 稀疏因子 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象