检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华北计算技术研究所信息技术应用系统部,北京100083
出 处:《信息技术》2014年第7期149-153,共5页Information Technology
摘 要:设计并实现了一个基于MapReduce的网络舆情分析系统。系统采用HDFS和HBase双存储机制存储数据。通过实验分析与效果比对,选用MMSeg4j为系统进行中文分词。改进了Canopy-Kmeans算法实现文本自动聚类,提高了系统的聚类准确度及效率。目前,该系统已应用于某部队舆情分析系统中,能够实时发现热点话题、准确把握舆情趋势,为应对舆论危机、制定舆论政策提供了科学系统的信息支持。This paper designed a network public opinion analysis system based on MapReduce. Dual storage mechanism composed of HDFS and HBase used for storing data. MMSeg4j was selected for Chinese word segmentation by comparing the experimental results and word segmentation efficiency. In order to improve the accuracy and efficiency of the clusters, the Canopy-Kmeans algorithm was improved. Currently, the system has been applied to a public opinion analysis system in an army, the system can detect hot topics in real time and grasp the trend of public opinion accurately. It offered scientific and systematic support for dealing with public opinion crises and formulating public policy.
关 键 词:HADOOP 舆情分析 MAPREDUCE 中文分词
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.186