搭建基于云计算的开源海量数据挖掘平台  被引量:11

Building the Open Source Mass Data Mining Platform Based on Cloud Computing

在线阅读下载全文

作  者:赵华茗[1] 

机构地区:[1]中国科学院国家科学图书馆,北京100190

出  处:《现代图书情报技术》2010年第10期76-81,共6页New Technology of Library and Information Service

摘  要:通过分析亚马逊弹性MapReduce(EMR)平台构架,针对信息情报机构内部数据处理的迫切需求,提出通过开源技术Xen和Hadoop平台构建基于云计算的动态可伸缩的海量数据处理平台并给出实施方案、海量文本数据处理案例和开源EMR平台的优势分析。实施方案主要分为三部分:搭建动态虚拟的云计算环境、安装制作Hadoop虚拟服务器模板、配置运行Cloudera和Cloudera Desktop。通过开源EMR架构的应用,可以有效解决服务器蔓延问题,提高网络计算资源的利用效率和分布式数据挖掘服务的快速布署能力及灵活性。Aiming to meet the internal data processing needs of information organizations, this paper, by analyzing the frameworks of Amazon Elastic Map/Reduce (EMR) platform, puts forward to build the dynamic and elastic open source mass data mining platform based on cloud computing, and provides a roadmap of successful implementation, an example of massive text data processing and the analysis of advantages of open source EMR platform. This implementation plan includes three parts: building dynamic virtual environment of cloud computing, creating the virtual server template of Hadoop, and deploying and running Cloudera and Cloudera Desktop. Through the application of open source EMR platform, the problem of server sprawl can be solved effectively, the utilization ratio of network computing resource is improved, and the rapid deployment capability and agility of distributed data processing services are enhanced.

关 键 词:云计算 海量数据挖掘 虚拟技术 分布式计算 XEN Cloudera HadooD 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象