基于属性组权重的分类数据离群检测被引量：1

Attribute Group Weight-based Outlier Detection for Categorical Data

作　　者：张凯棋宋亦静陈鑫 ZHANG Kai-qi;SONG Yi-jing;CHEN Xin(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)

机构地区：[1]太原科技大学计算机科学与技术学院,山西太原030024

出　　处：《计算机技术与发展》2023年第11期20-27,共8页Computer Technology and Development

基　　金：山西省基础研究计划资助项目(202103021223267);山西省高等学校科技创新计划项目(2021L297);太原科技大学科研启动基金项目(20212053,20222107)。

摘　　要：属性分组是高维离群检测中的有效手段之一,可以有效缓解“维度灾难”的干扰,但现有的属性分组离群检测方法未能体现属性组之间的差异性,以及属性组的偏离程度,严重影响了高维离群检测的效果与性能。该文采用信息熵累加和刻画与描述属性组之间的差异性,提出了一种基于属性组权重的分类离群检测方法。首先,根据数据模式频率和编码长度,定义了属性组偏离因子,并将其作为属性组之间的合并依据,有效地刻画了属性组的偏离程度,进一步提高了属性分组过程中的搜索效率;其次,利用信息熵累加和定义了属性组权重,有效地体现了不同属性组之间的差异性;然后,依据属性组权重,重新定义了离群得分函数,并提出了一种基于属性组权重的分类数据离群检测算法;最后,采用UCI,NTU,KEEL和人工合成数据集,实验验证了该离群检测算法不仅具有较高的检测精度和效率,而且也具有良好的可扩展性与伸缩性,可适用于高维海量分类属性数据集的离群检测任务。Attribute grouping is one of the effective methods in high-dimensional outlier detection,which can effectively alleviate the interference of“the curse of dimensionality”.However,existing attribute grouping outlier detection methods fail to reflect the differences among attribute groups and the deviation degree of attribute groups,which have a significant negative influence on the efficiency and performance of high-dimensional outlier detection.We propose an attribute group weight-based outlier detection method for categorical data by using information entropy cumulative sum,which depicts and describes the difference among attribute groups.Firstly,the attribute group deviation factor is defined according to the data pattern frequency and code lengths,and used as a basis of merging attribute groups,which effectively portrays the deviation among attribute groups and further improves the search efficiency in the process of attribute grouping.Secondly,the information entropy cumulative sum is used to define the attribute group weights,which effectively reflects the difference among different attribute groups.Thirdly,the outlier score function is redefined based on the attribute group weights,and an outlier detection algorithm for categorical data is proposed on this basis.In the end,experimental results on UCI,NTU,KEEL and synthetic datasets validate that the outlier detection algorithm not only has high detection accuracy and efficiency,but also has good extensibility and scalability,which can be applied to the outlier detection task of high-dimensional massive categorical attribute datasets.

关键词：离群检测属性分组分类数据属性组权重偏离因子

分类号：TP311.13[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于属性组权重的分类数据离群检测被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于属性组权重的分类数据离群检测 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于属性组权重的分类数据离群检测被引量：1