基于PageRank的热点发现混合算法研究  被引量:4

Research on Hotspot Detection Hybrid Algorithm Based on PageRank

在线阅读下载全文

作  者:应毅[1] 黄慧[1] 刘定一[1] YING Yi;HUANG Hui;LIU Ding-yi(School of Computer Science and Technology,Sanjiang University,Nanjing 210012,China)

机构地区:[1]三江学院计算机科学与工程学院

出  处:《计算机技术与发展》2019年第9期81-85,共5页Computer Technology and Development

基  金:江苏省高校哲学社会科学研究基金项目(2018SJA0506);江苏高校“青蓝工程”资助(苏教师[2018]12号);江苏省高等学校自然科学研究项目(18KJB520042)

摘  要:社交网络下的热点话题发现技术是当前舆情分析与预测的基础性研究问题。传统的基于聚类、分类的文本分析方法不适用于网络舆情挖掘,经典的PageRank算法仅考虑网页间的链接结构,为了更加准确和全面地多角度综合评价舆情热点,文中综合考虑用户社会地位、博文相似度指数和热度指数三个热点发现的重要指标,提出了基于PageRank和相似度计算的热点发现混合算法(HDH-PRSC)。其中基于PageRank算法与微博用户粉丝间的链接结构图获取用户的社会地位值;结合TF-IDF算法与余弦相似性算法计算博文的相似度指数;利用转发数、评论数和点赞数获得博文的热度指数。博文的最终热度评分由用户社会地位值、博文相似度指数和热度指数三项分值相加获得。依托新浪微博数据的实验表明,HDH-PRSC算法能够更为合理地发现热点话题。Hot topic discovery technology in social networks is a fundamental research issue in current public opinion analysis and prediction. However,the traditional text analysis method based on clustering and classification is not suitable for network public opinion mining,and the classical PageRank algorithm only considers the link structure between web pages. In order to evaluate public opinion hotspots more accurately and comprehensively from different angles,considering user social status,blog similarity index and heat index as three important indicators,we propose a hotspot detection hybrid algorithm based on PageRank and similarity calculation (HDH-PRSC),among which the social status value of users is obtained by the link structure map of micro-blog followers of each certain user,the similarity index of blog text is calculated based on TF-IDF algorithm and cosine similarity algorithm,while the heat index is got by forwarding numbers,comment numbers and point of praise. Finally,the heat score of blog can be obtained by adding the three scores of social status value,similarity index and heat index together. Experiments based on Sina micro-blog data shows that the HDH-PRSC algorithm can find hot topics more reasonably and effectively.

关 键 词:PAGERANK 用户社会地位 相似度指数 热度指数 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象