检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京工业大学信息学部,北京100124 [2]海通证券股份有限公司信息技术管理部,上海200001
出 处:《智能系统学报》2017年第5期717-728,共12页CAAI Transactions on Intelligent Systems
基 金:国家自然科学基金项目(91646201;91546111;60803086);国家科技支撑计划子课题(2013BAH21B02-01);北京市自然科学基金项目(4153058;4113076);北京市教委重点项目(KZ20160005009);北京市教委面上项目(KM201710005023)
摘 要:随着计算机和网络技术的迅猛发展以及数据获取手段的不断丰富,海量数据的实时处理需求日益增多,传统的日志分析技术在处理海量数据时存在计算瓶颈。大数据时代下,随着开放式处理平台的发展,能够处理大规模且多样化数据的大数据处理系统应运而生。为了让原有的业务能够充分利用Hadoop的优势,本文首先研究了基于大数据技术的网络日志分析方法,构建了网络日志分析平台以实现万亿级日志采集、解析、存储和高效、灵活的查询与计算。对比分析了Hive、Impala和Spark SQL这3种具有代表性的SQL-on-Hadoop查询系统实例,并展示了这类系统的性能特点。采用TPC-H测试基准对它们的决策支持能力进行测试及评估,通过对实验数据的分析和解释得到了若干有益的结论。实现了海量日志数据计算与分析在证券领域的几种典型应用,为进一步的研究工作奠定了基础。With the rapid development of computing and networking technologies,and the increase in the number of data acquisition methods,the demand for real-time processing of massive amounts of log data is increasing every day,and there is a calculation bottleneck when traditional log analysis technology is used to process massive amounts of data. With the development of open processing platforms in the era of big data,a number of big data processing systems have emerged for dealing with large-scale and diverse data. To effectively apply the advantages of Hadoop to the original businesses,in this study,we first investigated network log analysis methods based on big data technology and constructed a network log analysis platform for the acquisition,analysis,storage,highefficiency and flexible queries,and the calculation of trillions of log entries. In addition,we compared and analyzed three representative SQL-on-Hadoop query systems including Hive,Impala,and Spark SQL,and identified the performance characteristics of this type of system. We used the TPC-H testing reference to test and assess their decision-making support abilities. We drew some useful conclusions from the analysis of the experimental data. We also suggest a few typical applications for this analysis and processing system for massive log data in the securities fields,which provides a solid foundation for further research.
关 键 词:大数据 日志分析 数据挖掘 Hadoop 查询引擎 数据采集 索引存储 证券行业
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.166