Spark环境下基于子图的异步迭代更新方法  被引量:1

Asynchronous Iterative Updates Method Based on Subgraph in Spark

在线阅读下载全文

作  者:李超 董新华 陈建峡[1] LI Chao;DONG Xinhua;CHEN Jianxia(School of Computer Science,Hubei University of Technology,Wuhan 430068,China)

机构地区:[1]湖北工业大学计算机学院,武汉430068

出  处:《计算机工程与应用》2020年第7期67-73,共7页Computer Engineering and Applications

基  金:国家自然科学基金(No.61502155);湖北省科技厅自然科学基金(No.2017CFB326)。

摘  要:全局同步计算模型简单易用,但是路障同步导致收敛速度变慢。以顶点为中心的异步迭代虽然提高了收敛速度,但在计算节点之间需要频繁发送信息。在Spark环境下提出一种基于子图的异步迭代更新方法。在子图之间建立异步消息通信连接后,子图能以异步方式发送数据块;通过多线程同步避免数据读写冲突,保证异步更新时顶点状态的一致性。在大规模样本数据集上分别从收敛结果、收敛速度和通信代价验证方法有效性。实验结果表明,与全局同步迭代相比,该方法有效提高了计算收敛速度。与顶点为中心的异步更新方式相比,该方法在收敛时间上略有增长,但是显著降低了通信开销。Bulk synchronous parallel model is easy to program. However, long waiting time is required for each vertex to step into next round in the BSP models, and frequent messages-passing are incurred by vertex-centric asynchronous methods. To accelerate the execution of iterative graph computations, this paper proposes an asynchronous iterative method in Spark, and exploits two means to guarantee the validity. Firstly, by leveraging remote procedure call to establish connections, data blocks can be transmitted asynchronously among vertex partitions and edge partitions. Secondly, to guarantee data consistency during updating, synchronization is adopted to make threads exclusive access to vertex state. Experiments on large scale graph data are conducted on a local cluster. Comparing with the BSP and vertex-centric model, the proposed method not only accelerates iteration, but also improves communication efficiency.

关 键 词:子图 异步更新 Spark环境 图数据 图切分 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象