检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中科院计算技术研究所智能信息处理重点实验室,北京100190 [2]中国科学院大学,北京100039
出 处:《系统仿真学报》2013年第5期936-944,共9页Journal of System Simulation
基 金:国家自然科学基金(61035003;61072085;61202212;60933004);国家973项目(2013CB329502);国家863高技术研究发展计划课题(2012AA011003);国家科技支撑计划(2012BA107B02)
摘 要:业界已经开始运用云平台来处理海量高维数据,将各种异构系统仿真为一个系统,其中在Hadoop环境进行数据挖掘会遇到数据模型的全局性、HDFS的文件随机写操作、数据生命周期短等问题。为解决这些问题,在Hadoop上实现高效海量数据挖掘,提出了在Hadoop上一种高效数据挖掘框架,利用数据库来模拟链表结构,管理挖掘出来的知识,提供了树形结构、图模型的分布式计算方法;在此基础上实现一个统计算法——Yscore分箱算法,以及决策树和KD树的建树算法;并利用Vega云对Hadoop集群进行仿真。实验数据表明该框架和算法实用可行,且可能拓展与数据挖掘之外的其他领域。The cloud platform has been dealt in industry with large-scale high-dimensional data. A variety of heterogeneous systems have been simulated as one system, in which data mining on Hadoop will encounter the issues, such as the globalization of data models, the random write operations of HDFS files, and the duration of data life. For practical large-scale high-dimensional data mining, an efficient data mining framework on Hadoop was proposed to solve these problems, which used databases to simulate the linked list structure, and provided a distributed algorithm for structures of tree and graph model. Based on it, a statistical algorithm-Yscore binning - was proposed, as well as the DB-tree and KD-tree building algorithm. The Vega cloud was used as a simulation of Hadoop cluster. The experimental data shows that the framework and the algorithm is practical and feasible, and may be expanded to other areas outside of data mining.
关 键 词:并行数据挖掘 决策树算法 KD树算法 JPA 云计算
分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249