检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:廖彬 张陶[2] 于炯[2] 黄静莱 国冰磊 刘炎[3] Liao Bin;Zhang Tao;Yu Jiong;Huang Jinglai;Guo Binglei;Liu Yan(College of Statistics&Data Science,Xinjiang University of Finance&Economics,Urumqi 830012,China;School of Information Science&Engineering,Xinjiang University,Urumqi 830008,China;School of Software,Tsinghua University,Beijing 100084,China)
机构地区:[1]新疆财经大学统计与数据科学学院,乌鲁木齐830012 [2]新疆大学信息科学与工程学院,乌鲁木齐830008 [3]清华大学软件学院,北京100084
出 处:《计算机应用研究》2020年第5期1321-1325,共5页Application Research of Computers
基 金:新疆维吾尔自治区自然科学基金资助项目(2016D01B014)。
摘 要:由于任意的MapReduce作业都需要独立地进行任务调度、资源分配等一系列复杂的操作,这使得同一算法协同的多个MapReduce作业之间,存在着大量的冗余磁盘I/O及资源重复申请操作,导致计算过程中资源利用效率低下。大数据挖掘类算法通常被切分成多个MapReduce job协作完成。以Item Based算法为例,对多MapReduce作业协同下的大数据挖掘算法存在的资源效率问题进行了分析,提出基于Distributed Cache的ItemBased算法,利用Distributed Cache将多个MapReduce job之间的I/O数据进行缓存处理,打破作业之间独立性的缺陷,减少map与reduce任务之间的等待时延。实验结果表明,Distributed Cache能够提高MapReduce作业的数据读取速度,利用Distributed Cache重构后的算法极大地减少了map与reduce任务之间的等待时延,资源效率提高3倍以上。Because any MapReduce job requires a series of complex operations such as task scheduling and resource allocation independently,there are a lot of redundant disk I/O and resource duplicate application operations among multiple MapReduce jobs coordinated by the same algorithm,causing inefficient resource utilization in job computing process.Big data mining algorithms are usually divided into several MapReduce Jobs,taking ItemBased algorithm as an example,this paper analyzed the resource efficiency of mining algorithm with multi-MapReduce job collaboration scenario.It proposed an ItemBased algorithm based on DistributedCache,which used DistributedCache to cache I/O data between multiple MapReduce Jobs,broke the defect of independence between jobs,and reduced the waiting delay between Map and Reduce tasks.The experimental results show that,DistributedCache can improve the data reading speed of MapReduce jobs.The algorithm reconstructed by DistributedCache greatly reduces the waiting delay between Map and Reduce tasks,and improves the resource efficiency by more than three times.
关 键 词:MapReduce优化 ItemBased算法 内存文件系统 I/O效率 资源优化
分 类 号:TP393.09[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.63