检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈亚中 李振军 李荣华 毛睿[1] 乔少杰[2] CEHN Yazhong;LI Zhenjun;LI Ronghua;MAO Rui;QIAO Shaojie(College of Computer Science&Software Engineering,Shenzhen University,Shenzhen,Guandong 518060,China;Chengdu University of Information Technology,Chengdu 610103,China)
机构地区:[1]深圳大学计算机与软件学院,广东深圳518060 [2]成都信息工程大学,成都610103
出 处:《计算机工程与应用》2019年第1期76-83,114,共9页Computer Engineering and Applications
基 金:国家自然科学青年基金(No.61402292);国家自然科学基金广东省联合基金(No.U1301252)
摘 要:近年来,图数据聚类在学术界引起了广泛的关注,许多优秀的聚类方法,如模块度优化算法、谱聚类,以及基于密度的聚类算法在图数据上取得了很好的效果。SCAN是一种著名的基于密度的图聚类算法,该算法不仅能够找出图中的聚类,而且还能够发现不同聚类间的Hub节点,以及图中的离群点。然而,该算法存在两方面的局限性:首先,在大规模图数据上,该算法需要耗费大量的时间用于计算图中每条边的结构相似性;另一方面,该算法存在两个参数ε和μ,并且对这两个参数比较敏感。为了解决其局限性,提出了一种基于OpenMP的并行算法来求解节点相似性,并且提出了两种有效的负载均衡策略;其次,提出一种基于三角形的新型图结构聚类算法TSCAN。该模型能够有效降低算法对参数的敏感性,而且还能够发现重叠以及更稠密的社区。在多个大规模数据集上实验发现,基于多核的并行算法能够达到近乎线性的加速比,而且TSCAN算法对参数不敏感,能有效发现重叠社区。Recently, graph clustering has been attracted much attention in the research community. There are a number of clustering methods, such as modularity optimization algorithm, spectral clustering algorithm, as well as density-based clustering algorithm, that are proved to be useful for graph data. Among those algorithms, the SCAN algorithm is a well-known density-based algorithm for graph data. SCAN is not only able to find clusters, but it also identifies hub nodes and outliers. The SCAN algorithm, however, has two limitations. Firstly, it is very costly to compute the structural similarity for each edge in a massive graph. Secondly, SCAN is sensitive to its parameters ε and μ. To overcome these two limitations, this paper proposes an OpenMP-based parallel algorithm with two carefully-designed load-balancing techniques to compute the similarities efficiently, and a novel triangle-based graph structural clustering algorithm, called TSCAN is proposed. The striking feature of the TSCAN algorithm is that it is not sensitive to the parameters, and it is also capable of finding overlapping and dense communities. Finally, this paper conducts extensive experiments over several real-world datasets to evaluate the proposed algorithms. The results indicate that parallel implementation can achieve near-linear speedup, and the TSCAN algorithm is robust to its parameters and can identify overlapping communities.
关 键 词:社区探测 结构聚类算法 重叠社区 OPENMP 并行算法
分 类 号:TP302.8[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.228