面向微博的PageRank算法的改进与应用  被引量:3

IMPROVEMENT AND APPLICATION OF PAGERANK ALGORITHM FOR MICRO-BLOG

在线阅读下载全文

作  者:原野[1,2] 李晨[1] 田丽华[1] Yuan Ye Li chen Tian Lihua(Software Engineering School, Xi' an Jiaotong University, Xi' an 710049, Shaanxi, China Sina Corporation, Beijing 100000, China)

机构地区:[1]西安交通大学软件学院,陕西西安710049 [2]新浪网技术(中国)有限公司,北京100000

出  处:《计算机应用与软件》2017年第3期31-37,共7页Computer Applications and Software

基  金:国家自然科学基金项目(61403302)

摘  要:从海量数据下的社会化网络中识别出各个领域下产出高质量内容的具有一定影响力的专家,进行具有针对性的广告推荐与决策支持,已经成为微博数据挖掘亟待解决的问题之一。从微博的用户特征和行为特征出发,确定了采集博文的规则与互动量计算公式,并应用PageRank算法对微博用户影响力计算时存在的数据陈旧性和主题不相关性的问题进行了改进,最后分别基于MapReduce和Spark的并行计算框架对算法进行了实现。实验结果表明,该挖掘方法具有较好的准确性,在Spark并行计算框架下表现出较高的性能,尤其适合大规模数据集的场景。It has been one of the urgent problems of micro-blog mining to identify experts with ability to produce high-quality content and high influence under various fields in social network with massive data, and make targeted advertising recommendation and decision support. In this paper, on the basis of user features and behavior features, the rules of selecting article in miero-blog and interaction calculation formula are determined, and the obsolescence of data and irrelevance of theme have been improved by PageRank algorithm. Finally, the algorithm is implemented respectively in the parallel computing framework of MapReduce and Spark. Experimental results show that the proposed method has high accuracy and great performance under Spark , especially under large-scale dataset scene.

关 键 词:微博 用户影响力 PAGERANK Spark大数据 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象