检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:赵倩倩 吕敏[1] 许胤龙[1] ZHAO Qian-qian;LV Min;XU Yin-long(Anhui Key Laboratory on High Performance Computing,School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China)
机构地区:[1]中国科学技术大学计算机科学与技术学院高性能计算安徽省重点实验室,合肥230026
出 处:《计算机科学》2019年第3期314-320,共7页Computer Science
基 金:国家自然科学基金面上项目(61672486)资助
摘 要:graphlets是指大规模网络中节点数目较少的连通诱导子图,在社交网络和生物信息学领域有着广泛的应用。由于精确计数的计算成本较高,目前大多采用随机游走采样算法来近似估计graphlets的频率。随着节点数目的增多,graphlets的种类数增长迅速且结构变化复杂,快速估计大规模网络中所有种类的graphlets的频率是一项挑战。文中提出了基于两种子结构的随机游走采样算法CSRW2来估计graphlets频率,即给定graphlets节点数k(k=4,5),通过采样k-graphlets的子结构(k-1)-path和3-star得到两种样本,之后用比例放大法综合,以高效估计graphlets并适应graphlets结构的复杂变化。实验结果表明,CSRW2能以统一的框架估计所有k-graphlets类型的频率,其估计精度优于现有代表性算法,更适用于频率较低且结构较稠密的graphlets。例如,用CSRW2估计真实网络sofb-Penn94中的5-graphlets,当样本数为2万时,标准均方根误差的平均值由WRW算法的0.8降低至CSRW2算法的0.22左右。Graphlets refer to the connected induced subgraphs with small amount of nodes in large-scale network,and have extensive applications in social networks and bioinformatics.Due to extremely high computational costs of exactly counting graphlets,approximately estimating graphlets concentrations via random walk sampling algorithms already becomes the mainstream approach.As node size k increases,the number of k -graphlets increases rapidly and their structures change dramatically,so it is a challenge to quickly estimate the relative frequency of all types of graphlets (graphlet concentrations) in a large-scale network.Aiming at this problem,this paper proposed a novel sampling algorithm,namely common substructures path and 3-star based graphlets sampling via random walk (CSRW2),to efficiently estimate graphlets concentrations.Given k ( k =4,5),apart from sampling path via random walk,CSRW2 also samples ano- ther substructure 3-star,and then derives graphlets concentrations by proportional amplification to find the dense graphlets with less appearance more efficiently and adapt to the complex structural changes.Experimental evaluations on real networks demonstrate that CSRW2 can estimate k -graphlets in a uniform framework.CSRW2 outperforms the representative methods in terms of accuracy and is more accurate for the k -graphlets with more edges and less appearances in graphs.For example,when 5-graphlets in sofb-Penn94 is estimated,the average NRMSE of all 5-graphlets is decreased to 0.22 via CSRW2 in contrast to 0.8 obtained by WRW.
关 键 词:社交网络 Graphlet Graphlet频率 随机游走 采样算法 无偏估计
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.23.59.191