基于MapReduce的分布式贪心EM算法  被引量:1

Greedy EM algorithm based on MapReduce framework

在线阅读下载全文

作  者:曹家庆 吴观茂[1] Cao Jiaqing;Wu Guanmao(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan 232001,China)

机构地区:[1]安徽理工大学计算机科学与工程学院,安徽淮南232001

出  处:《信息技术与网络安全》2018年第5期84-87,92,共5页Information Technology and Network Security

基  金:国家自然科学基金(61471004);安徽理工大学研究生创新基金项目(2017CX2045)

摘  要:针对一种贪心EM算法在处理大规模数据集时收敛速度急剧减慢的问题,提出了一种基于MapReduce的贪心EM算法。该算法首先通过Map(映射)实现数据分发,对每个节点进行处理并生成相应的键值对,然后利用Reduce(归约)将生成的键值对进行整合,最终通过获取最优的高斯混合模型,进而得到模型成分数。通过与传统EM算法、贪心EM算法的运算结果进行比较,实验结果证明该算法在保证准确获取高斯混合模型的模型成分数的前提下,明显地提高了收敛速度。For the problem that the convergence rate of the existing greedy EM algorithm is drastically slowing down when dealing with largescale data set. In this paper,a greedy EM algorithm based on MapReduce is proposed based on the original greedy EM algorithm. Firstly,the data distribution is carried out through Map( mapping) and each node is processed to generate the corresponding key-value pairs. Then,the key-value of the integration is generated through Reduce( reduction). Finally,the number of model components is got by obtaining the optimal Gaussian mixture model. Compared with the traditional EM algorithm and the greedy EM algorithm,the experimental results show that the algorithm can greatly improve the convergence speed on the basis of ensuring the accurate acquisition of the model component of the Gaussian mixture model.

关 键 词:贪心EM算法 机器学习 数据挖掘 MAPREDUCE框架 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象