检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:臧艳辉[1] 赵雪章[1] 席运江[2] Zang Yanhui;Zhao Xuezhang;Xi Yunjiang(Foshan Polytechnic,Foshan Guangdong 528137,China;South China University of Technology,Guangzhou 510641,China)
机构地区:[1]佛山职业技术学院,广东佛山528137 [2]华南理工大学,广州510641
出 处:《计算机应用研究》2019年第12期3705-3708,3712,共5页Application Research of Computers
基 金:国家自然科学基金资助项目(71371077);佛山市科技计划项目(2015AB004241)
摘 要:针对现有面向大数据的计算框架在可扩展性机器学习研究中面临的挑战,提出了基于MapReduce和Apache Spark框架的分布式朴素贝叶斯文本分类方法。通过研究MapReduce和Apache Spark框架的适应性来探索朴素贝叶斯分类器(NBC),并研究了现有面向大数据的计算框架。首先,基于朴素贝叶斯文本分类模型将训练样本数据集分为m类;进一步在训练阶段中,将前一个MapReduce的输出作为后一个MapReduce的输入,采用四个MapReduce作业得出模型。该设计过程充分利用了MapReduce的并行优势,最后在分类器测试时取出最大值所属的类标签值。在Newgroups数据集进行实验,在所有五类新闻数据组上的分类都取得了99%以上的结果,并且均高于对比算法,证明了提出方法的准确性。Aiming at the challenges faced by the existing big data-oriented computing framework in the study of extensible machine learning,this paper proposed a distributed naive Bayesian text classification method based on MapReduce and Apache Spark framework. This method explored the Bayesian network classifier by studying the adaptability of MapReduce and Apache Spark frameworks,and studied the existing computing framework for big data. First,it divided the training sample data set into m classes based on the naive Bayes text classification model. In the training phase,it used the output of the previous MapReduce as the input of the next MapReduce,and used four MapReduce jobs to derive the model. This design process made full use of the parallel advantages of MapReduce. Finally,when the classifier was tested,it obtained the value of the class label which the maximum value belonged. Experiments in the Newgroup’s dataset show the proposed method achieves more than99% of the results on all five types of news data sets,and is all higher than the comparison algorithms,which prove the accuracy of the method.
关 键 词:文本分类 MAPREDUCE Spark框架 分布式 朴素贝叶斯分类器 机器学习
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.185