基于相似和差异双视角的高维数据属性约简  被引量:2

Attribute reduction for high-dimensional data based on bi-view of similarity and difference

在线阅读下载全文

作  者:李元江 权金升 谭阳奕 杨田 LI Yuanjiang;QUAN Jinsheng;TAN Yangyi;YANG Tian(Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing(Hunan Normal University),Changsha Hunan 410081,China)

机构地区:[1]智能计算与语言信息处理湖南省重点实验室(湖南师范大学),长沙410081

出  处:《计算机应用》2023年第5期1467-1472,共6页journal of Computer Applications

基  金:湖南省自然科学优秀青年基金资助项目(2021JJ20037);长沙市杰出创新青年培养计划项目(kq1905031)。

摘  要:针对数据维度过高、冗余信息过多导致维度灾难的问题,提出一种基于异同矩阵的高维属性约简算法(ARSDM)。该算法在区分矩阵的基础上加入对同类样本的相似度衡量,形成对所有样本的综合评估。首先,计算样本在每个属性下的距离,并基于这些距离得到同类相似度和异类差异度;其次,建立异同矩阵,形成对整个数据集的评价;最后,进行属性约简,即将异同矩阵的每一列求和,依次选择值最大的特征进行约简,并将相应样本对的行向量置为零向量。实验结果表明,与经典属性约简算法DMG(Discernibility Matrix based on Graph theory)、FFRS(Fitting Fuzzy Rough Sets)以及GBNRS(Granular Ball Neighborhood Rough Sets)相比,在分类回归树(CART)分类器下,ARSDM的平均分类准确率分别提高了1.07、6.48、8.92个百分点;在支持向量机(SVM)分类器下,ARSDM的平均分类准确率分别提高了1.96、11.96、12.39个百分点;运行效率上ARSDM优于GBNRS和FFRS。可见,ARSDM能够有效去除冗余信息,提高分类准确率。Concerning of the curse of dimensionality caused by too high data dimension and redundant information,a high-dimensional Attribute Reduction algorithm based on Similarity and Difference Matrix(ARSDM)was proposed.In this algorithm,on the basis of discernibility matrix,the similarity measure for samples in the same class was added to form a comprehensive evaluation of all samples.Firstly,the distances of samples under each attribute were calculated,and the similarity of same class and the difference of different classes were obtained based on these distances.Secondly,a similarity and difference matrix was established to form an evaluation of the entire dataset.Finally,attribute reduction was performed,i.e.,each column of the similarity and difference matrix was summed,the feature with the largest value was selected into the reduction in proper order,and the row vector of the corresponding sample pair was set to the zero vector.Experimental results show that compared with the classical attribute reduction algorithms DMG(Discernibility Matrix based on Graph theory),FFRS(Fitting Fuzzy Rough Sets)and GBNRS(Granular Ball Neighborhood Rough Sets),the average classification accuracy of ARSDM is increased by 1.07,6.48,and 8.92 percentage points respectively under the Classification And Regression Tree(CART)classifier,and increased by 1.96,11.96,and 12.39 percentage points under the Support Vector Machine(SVM)classifier.At the same time,ARSDM outperforms GBNRS and FFRS in running efficiency.It can be seen that ARSDM can effectively remove redundant information and improve the classification accuracy.

关 键 词:异同矩阵 区分矩阵 属性约简 粗糙集 粒计算 数据挖掘 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] TP311.13[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象