Spark下的分布式粗糙集属性约简算法  被引量:7

Distributed rough set attribute reduction algorithm under Spark

在线阅读下载全文

作  者:章夏杰 朱敬华[1,2] 陈杨 ZHANG Xiajie;ZHU Jinghua;CHEN Yang(School of Computer Science and Technology,Heilongjiang University,Harbin Heilongjiang 150080,China;Key Laboratory of Database and Parallel Computing of Heilongjiang Province,Harbin Heilongjiang 150080,China)

机构地区:[1]黑龙江大学计算机科学技术学院,哈尔滨150080 [2]黑龙江省数据库与并行计算重点实验室,哈尔滨150080

出  处:《计算机应用》2020年第2期518-523,共6页journal of Computer Applications

基  金:黑龙江省自然科学基金面上项目(F2018028)~~

摘  要:属性约简(特征选择)作为数据预处理的重要环节,大多以属性依赖作为筛选属性子集的标准。设计了一种快速依赖计算方法FDC,通过直接寻找基于相对正域的对象来计算依赖度,而不需要预先求出相对正域,相比传统方法在速度上有明显的性能提升。另外,改进鲸鱼优化算法(WOA)使其能够有效应用于粗糙集属性约简。结合上述两个方法,提出一种基于Spark的分布式粗糙集属性约简算法SP-WOFRST,并在两组人工合成的大数据集上与另一种基于Spark的粗糙集属性约简算法SP-RST进行对比实验。实验结果表明所提出的SP-WOFRST算法在精度和速度上均优于SP-RST。Attribute reduction(feature selection)is an important part of data preprocessing.Most of attribute reduction methods use attribute dependence as the criterion for filtering attribute subsets.A Fast Dependence Calculation(FDC)method was designed to calculate the dependence by directly searching for the objects based on relative positive domains.It is not necessary to find the relative positive domain in advance,so that the method has a significant performance improvement in speed compared with the traditional methods.In addition,the Whale Optimization Algorithm(WOA)was improved to make the calculation method effective for rough set attribute reduction.Combining the above two methods,a distributed rough set attribute reduction algorithm based on Spark named SP-WOFRST was proposed,which was compared with a Spark-based rough set attribute reduction algorithm named SP-RST on two synthetical large data sets.Experimental results show that the proposed SP-WOFRST algorithm is superior to SP-RST in accuracy and speed.

关 键 词:粗糙集 APACHE SPARK 鲸鱼优化算法 特征选择 属性约简 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象