Spark协同过滤算法扩展性测试与分析  被引量:2

Scalability testing and analyzing of ALS on Spark

在线阅读下载全文

作  者:沈雯婷 刘财政 孙磊 李慧[1] 许利杰 王伟[1,2,3] SHEN Wen-ting;LIU Cai-zheng;SUN Lei;LI Hui;XU Li-jie;WANG Wei(Technology Center of Software Engineering,Institute of Software Chinese Academy of Sciences,Beijing 100190,China;School of Computer and Control Engineering,University of Chinese Academy of Sciences,Beijing 100049,China;State Key Laboratory of Computer Science,Institute of Software Chinese Academy of Sciences,Beijing 100190,China;Tianjin Massive Data Processing Technology Laboratory,Tianjin Shenzhou General Data Technology Limited Company,Tianjin 300384,China)

机构地区:[1]中国科学院软件研究所软件工程技术研究开发中心,北京100190 [2]中国科学院大学计算机与控制学院,北京100049 [3]中国科学院软件研究所计算机科学国家重点实验室,北京100190 [4]天津神舟通用数据技术有限公司天津市海量数据处理技术实验室,天津300384

出  处:《计算机工程与设计》2019年第6期1574-1579,共6页Computer Engineering and Design

基  金:国家自然科学基金项目(61572480);北京市重大基金项目(D171100003417002)

摘  要:机器学习算法的线性扩展性要求算法的计算性能随节点数增加保持接近线性增长。针对当前ALS算法扩展性测试的不足,提出一种多维度扩展性测试方法,通过横向测试进行扩展性测试,使用纵向测试确定扩展性瓶颈。结合真实数据集在Spark MLlib上进行测试,实验结果表明,该算法对节点敏感,会出现任务聚集到某个节点上的问题,同时随着任务并行度增加,算法执行时间增加,效率降低。The linear scalability of the distributed machine learning algorithm determines that the computational performance of the algorithm should keep increasing linearly as the number of nodes increases when parallelizing and distributing. Aiming at the shortcomings of extensibility testing of current ALS algorithm, a multi-dimensional scalability test method was proposed. Sca- lability test was carried out through the lateral test, and the extensibility bottleneck was determined by longitudinal test. The ALS algorithm was tested on the Spark MLlib with natural data. Experimental results show that the algorithm is sensitive to nodes under certain conditions, and the problem of clustering tasks to a certain node occurs. At the same time, with the increa- sing parallelism of the task, the execution time increases, and the efficiency decreases.

关 键 词:分布式机器学习算法 交替最小二乘法 扩展性 多维度测试 测试发现 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象