检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张凤 卢居辉 朱海勇[1] 吴文[1] ZHANG Feng;LU Juhui;ZHU Haiyong;WU Wen(Qiankun Big Data Operating System Research Institute,Xiamen Meiya Pico Information Co.,Ltd.,Xiamen 361001,China)
机构地区:[1]厦门市美亚柏科信息股份有限公司乾坤大数据操作系统研究院,福建厦门361001
出 处:《河南科技》2023年第15期19-24,共6页Henan Science and Technology
摘 要:【目的】满足前端用户频繁交互需求,克服传统重客户端与Spark应用服务保持长连接会话的弊端。【方法】在边缘节点服务器上部署高性能负载均衡和动态代理组件(HAProxy),提供一种通过轻量级客户端提交Spark作业的实现方法,对Spark作业进行动态调度与全生命周期管理。【结果】通过Spark on YARN模式将多个具有相同功能、相互之间能独立运行的Rest服务部署到YARN集群上,利用HAProxy的自动重载机制进行动态更新和加载后端服务配置,使前端用户在对后端变动无感知的情况下,通过HAProxy统一对外接口,将Spark作业提交到分散运行在Yarn集群上无差别的Rest服务中执行。【结论】该方法无须保持边缘节点服务器与集群节点服务器之间的长连接会话,通过HAProxy能有效避免外部用户直接访问集群内部节点,实现集群内外安全隔离的目的,同时可在Spark on YARN运行模式下实现Spark作业的交互式提交与异步调度,完成对Spark作业全生命周期的自主控制。[Purposes]To meet the frequent interaction needs of front-end users and overcome the draw-backs of traditional heavy client and Spark application service to maintain long connection sessions.[Methods]A high-performance load balancing and dynamic proxy component(HAProxy)was deployed on the edge node server to provide an implementation method for submitting Spark jobs through lightweight cli-ents,and to dynamically schedule and manage the full life cycle of Spark jobs.[Findings]Through the Spark on YARN mode,multiple Rest services that are with the same function and can run independently with each other are deployed to the YARN cluster.The automatic overload mechanism of HAProxy is used to dynami-cally update and load the back-end service configuration,so that the front-end users can submit the Spark job to the undifferentiated Rest service running on the Yarn cluster through the HAProxy unified external in-terface under the condition of no perception of back-end changes.[Conclusions]This method does not need to maintain a long connection session between the edge node server and the cluster node server.Through HAProxy,it can effectively avoid external users from directly accessing the internal nodes of the cluster,and achieve the purpose of security isolation inside and outside the cluster.At the same time,it can realize the in-teractive submission and asynchronous scheduling of Spark jobs in Spark on YARN operation mode,and com-plete the autonomous control of the whole life cycle of Spark jobs.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7