高维度的数据强跳跃显露模式挖掘方法研究  被引量:2

An efficient method to mine strong jumping emerging patterns from high-dimensional datasets

在线阅读下载全文

作  者:刘全中[1] 聂艳明[1] 宁纪锋[1] 

机构地区:[1]西北农林科技大学信息工程学院,陕西杨凌712100

出  处:《华中科技大学学报(自然科学版)》2013年第8期55-60,共6页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基  金:国家自然科学基金资助项目(61003151);中央高校基本科研业务费专项资金资助项目(QN2012033;QN2013053)

摘  要:针对经典的对照模式树挖掘方法仅能有效地挖掘低维度数据的强跳跃显露模式问题,提出了一种高维度的数据强跳跃显露模式挖掘方法.首先,设计了一种动态对照模式树结构存储已拓展的模式及其关键信息;然后,构造了一个初始动态对照模式树存储频繁项及其在正、负例上的二进制串;最后,开发了基于初始动态对照模式树的强跳跃显露模式挖掘算法.在高维度的肿瘤基因表达数据集上进行实验,结果表明:与经典的对照模式树方法及其改进后的方法相比,所提出的方法挖掘速度更快,有效处理的维度更高;在可接受的时间内,该方法能挖掘出一些对照模式树方法不能发现的重要的强跳跃显露模式.The contrast pattern tree (CP-tree) algorithm of mining strong jumping emerging patterns (SJEPs) only works for low-dimensional datasets efficiently. An efficient method for mining SJEPs in high-dimensional datasets was proposed. Firstly, a dynamic contrast pattern tree (DCP-tree) structure for storing grown patterns and their crucial information was designed. Then, an initial DCP-tree was constructed to store frequent items and their bit strings in the positive and negative class. Finally, an algorithm based on the initial DCP-tree for discovering SJEPs was developed. Experiments were performed on real cancer datasets with high-dimensional genes and the proposed method was compared with the CP-tree and the improved CP-tree methods. The results show that the proposed method is substantially faster, and able to effectively handle higher-dimensional datasets. Within an acceptable amount of time, the method is able to mine more important SJEPs which are not discovered by the CP-tree and the improved CP-tree methods.

关 键 词:数据挖掘 强跳跃显露模式 对照模式树 频繁模式 模式修剪 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象