跨域环境下图流三角计数算法GTC  

GTC:geo-distributed triangle counting algorithm in graph streams

在线阅读下载全文

作  者:曹春泽 马德龙 袁野 CAO Chunze;MA Delong;YUAN Ye(School of Computer Science and Engineering,Northeastern University,Shenyang Liaoning 110169,China;School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)

机构地区:[1]东北大学计算机科学与工程学院,沈阳110169 [2]北京理工大学计算机学院,北京100081

出  处:《计算机应用》2023年第7期2040-2048,共9页journal of Computer Applications

基  金:国家自然科学基金资助项目(61932004,62002054)。

摘  要:现有的分布式三角计数算法假设所有计算节点位于同一地理位置,然而现实中它们可能位于跨洲际的多个数据中心中。跨域分布的数据中心使用广域网连接,具有网络带宽异质、通信费用高昂、分布不均等特点,而现有分布式算法无法适用于跨域环境。同时,现有研究较多采用随机采样、淘汰边等策略,忽略了三角形的形成具有时间局部性的特点。因此,研究了跨域环境中真实图流的三角计数问题并提出跨域三角计数(GTC)算法。首先针对现有边分发策略导致数据传输量过高的问题,提出一种跨域边分发策略,以结合通信的时间收益和数据收益建立收益公式,并使用点对点通信代替广播边;然后对于点对点通信在跨域环境中导致的三角形重复计数问题,提出终边计算规则,以确保无重复计数;最后基于时间加权采样算法提出时间加权三角计数算法,以利用三角形的时间局部性特点采样。在5个图流上把GTC与CoCoS(Conditional Counting and Sampling)、Tri-Fly进行对比的结果表明:GTC在通信数据量上比CoCoS减少了17%,比Tri-Fly减少了44%;在误差率上GTC比Tri-Fly减小了53%,略低于CoCoS;在算法运行时间上GTC比Tri-Fly减少了34%,略高于CoCoS。可见,GTC在保证较高准确率与较短算法运行时间的情况下,能有效减少通信数据量。The existing distributed triangle counting algorithms assume that all computing nodes are in the same location,while in reality,the nodes may be located in multiple data centers across continents.Geo-distributed data centers which are connected with wide area networks have characteristics of heterogeneous network bandwidth,high communication cost and uneven distribution,and the existing distributed algorithms cannot be applied to geo-distributed environment.At the same time,the existing research which ignores the temporal locality of the formation of triangles mostly adopts strategies such as random sampling and elimination of edges.Therefore,the triangle counting problem of real graph streams in geodistributed environment was studied,and a Geo-distributed Triangle Counting(GTC)algorithm was proposed.Firstly,aiming at the problem of too high data transmission caused by the existing edge distribution strategy,a geo-distributed edge distribution strategy was proposed to build a benefit formula combining the time benefit and data benefit of communication and use point-to-point communication to replace broadcast edges.Then,for the triangle repeated counting problem caused by point-to-point communication in geo-distributed environment,a final edge calculation rule was proposed to ensure no counting repetition.Finally,based on the time weighted sampling algorithm,a time-weighted triangle counting algorithm was proposed to use the time locality of the triangle to sample.The GTC was compared with Conditional Counting and Sampling(CoCoS)and Tri-Fly on five real graph streams.The results show that GTC has the communication data size decreased by 17% compared to CoCoS and decreased by 44% compared to Tri-Fly,GTC has the error rate decreased by 53% compared to Tri-Fly and slightly less than CoCoS,and GTC has the running time decreased by 34% compared to Tri-Fly and slightly more than CoCoS.It can be seen that the GTC can reduce the size of communication data effectively while ensuring high accuracy and short algorithm runnin

关 键 词:跨域 图流 三角计数 近似计算 采样 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象