检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:余胜辉 李玲娟[1] YU Sheng-hui;LI Ling-juan(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
机构地区:[1]南京邮电大学计算机学院,江苏南京210023
出 处:《计算机技术与发展》2020年第6期19-22,共4页Computer Technology and Development
基 金:国家重点研发计划专项(2017YFB1401302,2017YFB0202200);国家自然科学基金(61572260,61872196)。
摘 要:随着大数据时代的来临,传统的计算模式已经不足以支撑如此大量的数据。基于内存计算的大数据并行化计算框架Spark的出现很好地解决了这一问题。CURE是一种基于取样和代表点的层次聚类算法,它采用迭代的方式,自底向上地合并两个距离最近的簇。与传统的聚类算法相比,CURE算法对异常点的敏感度更小。但是在处理大量数据的情况下,CURE算法存在着因反复迭代而消耗大量时间的问题。文中利用了Spark的RDD编程模型的可伸缩性和分布式等特点,实现了对CURE算法计算过程的并行化,提升了该算法对数据的处理速度,使算法能够适应数据规模的扩展,并且提高了聚类的性能。在Spark上运用CURE算法对公开数据集的并行化处理结果表明,基于Spark的CURE算法并行化既保证了聚类准确率又提高了算法的时效性。With the advent of the era of big data,traditional computing models are not enough to support such a large amount of data.The emergence of Spark,a big data parallel computing framework based on in-memory computing,solves this problem well.CURE is a hierarchical clustering algorithm based on sampling and representative points,and uses an iterative method to merge two closest clusters from the bottom up.Compared with traditional clustering algorithm,CURE algorithm is less sensitive to outliers.However,in the case of processing large amounts of data,the CURE algorithm has the problem of consuming a lot of time due to repeated iterations.We utilize the scalability and distributed characteristics of Spark’s RDD programming model to realize the parallelization of the computing process of CRUE algorithm,which improves the speed of data processing,makes the algorithm adapt to the expansion of data scale,and improves the performance of clustering.The parallelization of the public dataset using CURE algorithm on Spark shows that the parallelization of Spark-based CURE algorithm not only ensures the clustering accuracy but also improves the timeliness of the algorithm.
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.169