知识粒度框架下并行知识约简算法研究  被引量:1

Parallel knowledge reduction algorithm using knowledge granularity

在线阅读下载全文

作  者:吕萍 常玉慧 钱进 LüPing;Chang Yuhui;Qian Jin(School of Computer Engineering,Jiangsu University of Technology,Changzhou,213001,China;School of Software,East China Jiaotong University,Nanchang,330013,China)

机构地区:[1]江苏理工学院计算机工程学院,常州213001 [2]华东交通大学软件学院,南昌330013

出  处:《南京大学学报(自然科学版)》2022年第4期594-603,共10页Journal of Nanjing University(Natural Science)

基  金:国家自然科学基金(62066014);江苏省“青蓝工程”;江西省“双千计划”;江西省自然科学基金(20202BABL202018)。

摘  要:面向大规模数据的知识约简是近年来粗糙集理论的研究热点.传统的知识约简算法通常将小规模数据一次性装入内存中进行约简,因此无法处理海量数据.此外,采用不同的属性不确定性度量会导致并行知识约简算法效率上的差异.为此,从知识粒度视角研究这些不确定性度量的差异和联系,设计了数据和任务同时并行的Map和Reduce函数来计算不同候选属性子集导出的等价类和属性子集的不确定性,构建了一种知识粒度框架下并行知识约简算法模型来获取一个约简,并在Hadoop平台上进行了相关实验.实验结果表明,这些并行知识约简算法可以有效处理海量数据集.Knowledge reduction for massive datasets has attracted many research interests in rough set theory. Classical knowledge reduction algorithms assume all the datasets can be loaded into the main memory of a single machine,which is infeasible for large-scale data. Meanwhile,different measures of uncertainty largely affect the efficiency of the parallel attribute reduction algorithms. To address this issue,from the perspective of knowledge granularity,this paper systemically studies the interrelationships of classical measures of uncertainty. Then,in order to compute the equivalence classes and attribute significance in parallel on different candidate attribute sets,the Map and Reduce functions are designed and implemented using data and task parallelisms. Finally,the parallel algorithm model using knowledge granularity is constructed for knowledge reduction via MapReduce,which can be used to compute a reduct for the algorithms based on the relatively discernibility relation,the relatively indiscernibility relation and the complementary condition entropy. The experimental results demonstrate that the proposed parallel knowledge reduction algorithms can efficiently process massive datasets on Hadoop platform.

关 键 词:MAPREDUCE 知识约简 数据并行 任务并行 知识粒度 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象