基于统计感知的大数据系统计算框架被引量：5

A statistical aware based big data system computing framework

作　　者：魏丞昊黄哲学何玉林 WEI Chenghao;HUANG Zhexue;HE Yulin(College of Computer Science and Software Engineering,Shenzhen University,Shenzhen 518060,Guangdong Province,P.R.China)

机构地区：[1]深圳大学计算机与软件学院大数据技术与应用研究所,广东深圳518060

出　　处：《深圳大学学报（理工版）》2018年第5期441-443,共3页Journal of Shenzhen University(Science and Engineering)

基　　金：国家自然科学基金资助项目(61503252;61473194);国家重点研发计划资助项目(2017YFC0822604-2);深圳大学新引进教师科研启动资助项目(2018060)~~

摘　　要：为在一定计算资源条件下实现大数据可计算化,本研究提出一种基于统计感知思想的Tbyte级大数据系统计算框架Bigdata-α,该框架的核心为大数据随机样本划分模型和逼近式集成学习模型.前者保证了划分后每个子数据块所包含的样本与大数据总体概率分布的一致性.后者通过分析若干个随机样本数据块替代了Tbyte级全量数据分析.使用1 Tbyte模拟数据集验证随机样本划分模型的有效性,通过逐渐增加随机样本块的个数,提升了Higgs数据集基分类器的分类准确度,证明该方法能克服大数据分析中计算资源的限制瓶颈.In order to realize the computability of big data in a certain computing resource,a statistical aware based big data system computing framework(abbreviated as Bigdata-α)is proposed in this paper to deal with Tbyte grade big data.The core of the framework is random sample partition model and asymptotic ensemble learning model.The first one guarantees the consistent distributions between the big data and its data-blocks,while the second one provides an unbiased and convergent learning model by using some samples of the big date.The effectiveness of the random sample partitioning model is verified by using a 1 Tbyte simulation dataset.By gradually increasing the number of random sample blocks,the classification accuracy of the base classifier is improved.The massive computing resources is avoided in big data analysis.

关键词：计算机系统结构大数据随机样本划分逼近式集成学习并行分布式计算分布式处理系统

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于统计感知的大数据系统计算框架被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于统计感知的大数据系统计算框架 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于统计感知的大数据系统计算框架被引量：5