检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]清华大学软件学院,北京100084
出 处:《计算机科学与探索》2016年第9期1211-1220,共10页Journal of Frontiers of Computer Science and Technology
基 金:清华大学信息科学与技术国家实验室大数据科学与技术专项~~
摘 要:大数据应用系统包含数据的采集、存储、分析、挖掘、可视化等多个技术环节,各个环节都存在多种解决方案,涉及到的各类系统有数百种之多,且系统配置较为复杂,这给企业的大数据应用系统构建带来了极大的挑战。针对大数据应用系统开发中构件选型的难题,通过建立规范化的需求指标,并采用决策树模型实现了大数据构件的自动选型。从几个主流的分布式存储系统出发,以Cassandra为例,利用多元回归拟合的方法针对硬件参数建立相应的性能模型,将用户需求作为输入,利用性能模型进行系统硬件参数配置;通过研究系统原理、架构、特点及应用场景,构建软件参数配置知识库指导软件参数的配置,从而解决了大数据系统开发中的构件自动选型和参数配置问题。Big data applications include data collection, storage, analysis, mining, visualization, and other technical aspects. Every aspect has a variety of solutions, involves several hundred application systems and the system configuration is complicated, which has brought great challenges for a company to construct big data applications. To solve the problem of component selection in the development of application system, this paper establishes standardized requirement norms and achieves automatic component selection by using the components selection decision tree. This paper embarks from the several mainstream distributed storage systems, takes Cassandra as an example, conducts experiments and uses multiple regression method to calculate the performance model for hardware parameters. Then, this paper uses the performance model to help user configure hardware parameters under the input of user's requirements.Finally, this paper studies the system's principle, structure and characteristics and constructs a knowledge base of software parameters configuration to help configure software parameters. In these ways the problem of component selection and parameter configuration in the development of big data system can be solved.
关 键 词:大数据系统 构件选型 决策树模型 参数配置 性能模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7