Parallel Web Mining System Based on Cloud Platform  被引量:1

Parallel Web Mining System Based on Cloud Platform

在线阅读下载全文

作  者:Shengmei Luo Qing He Lixia Liu Xiang Ao Ning Li Fuzhen Zhuang 

机构地区:[1]Pre-Research department of ZTE [2]Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences [3]Graduate University of Chinese Academy of Sciences

出  处:《ZTE Communications》2012年第4期45-53,共9页中兴通讯技术(英文版)

基  金:supported by the National Natural Science Foundation of China (No. 61175052,60975039, 61203297, 60933004, 61035003);National High-tech R&D Program of China (863 Program) (No.2012AA011003);supported by the ZTE research found of Parallel Web Mining project

摘  要:Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.

关 键 词:web mining large scale high volume high dimension cloudcomputing 

分 类 号:TP393.09[自动化与计算机技术—计算机应用技术] TP391.1[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象