基于云计算Hadoop平台的文本挖掘预处理方法被引量：1

Preprocessing Method of Text Mining Based on Hadoop Platform

作　　者：张爱科[1] ZHANG Aike(Liuzhou Vocational and Technical College, Liuzhou 545006, Chin)

出　　处：《上海工程技术大学学报》2017年第2期115-119,共5页Journal of Shanghai University of Engineering Science

基　　金：广西教育厅科研资助项目(201204LX593);广西中青年教师基础能力提升资助项目(KY2016LX516)

摘　　要：随着信息社会的快速发展,网络数据正在指数级地增长,其中大部分都是文本数据.如何在有限的时间内完成大规模的文本数据挖掘分析,已成为当前的热点研究问题.文本预处理是整个挖掘过程中最耗时的环节,分布式并行处理可以缩短该过程的挖掘时间.设计分析了基于云计算Hadoop平台的文本预处理MapReduce并行化过程,并对预处理的Map函数和Reduce函数进行了详细介绍.通过实验证明,和单节点运行相比,改进后的并行化方法具有更好的性能.With the rapid development of information society, network data increase exponentially, and most of the network data exist in the form of text. It is a rescarch hotspot to mining and analyze the massive text data within the limited time. The text preprocessing is the longest step in the whole mining, and distributed parallel processing can shorten the pretreatment time. The MapReduce parallel improvement of the preprocessing was designd and analysed based on the Hadoop platform, and Map function and Reduce function were depicted in detail. The experiment results show that the improved parallel execution has better performance compared with the single node.

关键词：云计算 HADOOP平台文本挖掘文本预处理分布式并行处理

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于云计算Hadoop平台的文本挖掘预处理方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于云计算Hadoop平台的文本挖掘预处理方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于云计算Hadoop平台的文本挖掘预处理方法被引量：1