基于MapReduce的大规模网络社区发现算法  被引量:2

Large-scale Network Community Detection Algorithm Based on MapReduce

在线阅读下载全文

作  者:王瀚橙 戴海鹏[1] 陈志鹏 陈树森 陈贵海[1] WANG Hancheng;DAI Haipeng;CHEN Zhipeng;CHEN Shusen;CHEN Guihai(State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China)

机构地区:[1]计算机软件新技术国家重点实验室(南京大学),南京210023

出  处:《计算机科学》2024年第4期11-18,共8页Computer Science

基  金:国家自然科学基金(62272223,U22A2031,61872178)。

摘  要:社区发现是社会网络挖掘领域的基本问题。随着海量数据的迅速产生,传统社区发现算法愈发难以处理大规模社会网络。因此,针对大规模网络设计高效的社区发现算法意义重大。文中提出了一种基于MapReduce和k中心聚类的新型分布式算法。首先,该算法提出“朋友圈系数”技术,该技术可更加准确地度量结点间的距离。其次,该算法提出“两阶段k中心聚类”技术,该技术在选取中心点过程中融入结点中心度启发式信息,可显著优化输出结果的模块度。最后,该算法提出“以模块度为优化目标的社区融合”技术,该技术能够在无先验知识的前提下自动确定网络中的社区数目。实验结果表明,所提算法的社区发现结果模块度明显优于最先进的社区发现算法。例如,相比LPA算法,其将模块度平均提升9.19倍。Community detection is a fundamental problem in the field of social network mining.With the rapid generation of massive data,traditional community detection algorithms are becoming increasingly difficult to handle large-scale social networks.Therefore,it is of great significance to design efficient community detection algorithms for large-scale networks.This paper proposes a new distributed algorithm based on MapReduce and k-center clustering.Firstly,the algorithm proposes the“friend circle coefficient”technique,which can measure the distance between nodes more accurately.Secondly,the algorithm proposes the“two-stage k-center clustering”technique,which incorporates node centrality heuristic information into the process of selecting center points and can significantly optimize the modularity of the results.Finally,the algorithm proposes a“community fusion method with modularity as the optimization goal”technique,which can automatically determine the number of communities in the network without prior knowledge.The evaluation results show that the proposed algorithm significantly outperforms the state-of-the-art community discovery algorithms in terms of modularity.For example,compared with the LPA algorithm,the proposed algorithm increases the modularity by an average of 9.19 times.

关 键 词:社区发现 k中心聚类 分布式计算 数据挖掘 大数据 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象