检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:朱金山[1]
出 处:《集宁师范学院学报》2017年第6期37-41,共5页Journal of Jining Normal University
基 金:浙江省教育科学规划课题"基于有线;无线一体化网络构建高校舆情分析系统"(课题编号:2017SCG228)
摘 要:网络敏感词分析是舆情监控系统的关键,该文介绍了Spark、Flume、kafka等用于系统架构的主要开源组件,分析了敏感词分析中主要用到的Han LP中文分词和命名实体识别两大组件,以及利用Word2vec训练词向量组件进行相似度判断的算法原理及时间复杂度比较,根据高校网络用户流量特征,提出了舆情监控的系统架构设计,最后展示了系统原型实现,并对其进行了探讨及前景展望。The analysis of the sensitive words on the network at universities is the key to the public opinion monitoringsystem. Based on the open components like Spark, Flume and Kafka applied to systematic constructions, this paper analyzesthe two dominantly-used components concerning the analysis of sensitive words-the components of Chineseword-segmentation and naming-identification from the system of HanLP. Also, arithmetic principles and temporalcomplexity for similarity judgement are suggested by Word2vec. Thus, it is proposed that the public opinion monitoringshould be systematically designed and implemented based on users’ flow characteristics at universities. Finally, a case of theoriginal system is demonstrated concerning its discussion and outlook.
关 键 词:敏感词分析 SPARK FLUME Kafka HanLP Word2vec
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.44