检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:翟晓芳[1] 刘全明[1] 程耀东[2] 胡庆宝[2] 李海波[2]
机构地区:[1]山西大学计算机与信息技术学院,太原030006 [2]中国科学院高能物理研究所计算中心,北京100049
出 处:《计算机工程》2015年第7期31-35,共5页Computer Engineering
基 金:国家"863"计划基金资助项目"基于媒体大数据的大众信息消费服务平台及应用示范"(SS2014AA012305)
摘 要:微博作为新型的消息传播媒介,其影响力和传播速度都超越了传统主流媒体,预测微博热度对舆情监测、政府宣传、企业营销及热点推送等具有重要意义。通过分析微博转发的层次规律,结合转发量、转发深度及广度指标,定义新的热度指数计算方法。将微博热度划分为5个等级,对转发数大于100的微博预测其热度达到特定等级的概率。使用有监督的机器学习算法,先后提取训练样本的静态和动态特征训练热度预测模型。通过自主开发的Big Data爬虫开放平台获取来源于新浪微博的训练样本,并应用十折交叉验证法进行实验,结果表明,相比只使用静态特征的热度预测模型,加入微博动态特征能有效提高预测性能,平均F1值达到76.9%。Microblog is a new type of news media,and its influence and propagation speed surpasses traditional major media. Therefore,it has a great importance to predict hotness in microblog for public opinion monitoring, government propaganda, corporation marketing and popular issues pushing. Through analyzing microblog forward level which combining the effects of the forward index, forward depth and breadth index, this paper gives a new definition of calculating the hotness index of microblog. Then depend on this definition, the hotness index of the microblog is classified as five levels. The goal is to predict the hotness of microblog whose repost count is over 100 to achieve a specified level. By using supervised machine learning algorithm, it successively extracts the static attributes and dynamic repost characteristics of the training samples to train hotness prediction model. The training samples is from Sina microblog is caught by using self-developed BigData open crawler platform. Experimental result by using 10-fold cross-validation shows that, compared with hotness prediction model based on effectively improve the prediction performance, and Fl-measure static attributes, the model with dynamic features can achieves 76.9% .
关 键 词:微博 爬虫 静态特征 动态特征 热度指数 多分类问题
分 类 号:TP311.5[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7