基于核模式合成的频繁巨模式挖掘算法  

Algorithm for Mining Colossal Fequent Pattern Based on Core Pattern Fusion

在线阅读下载全文

作  者:陶剑文[1] 

机构地区:[1]浙江工商职业技术学院信息工程系,宁波315012

出  处:《情报学报》2008年第3期344-350,共7页Journal of the China Society for Scientific and Technical Information

基  金:浙江省教育厅科研项目资助(20040120);浙江省教育厅青年教师科研基金资助.

摘  要:已有的频繁模式挖掘算法难以适应像生物信息数据挖掘、图模式挖掘等频繁巨模式挖掘应用。提出一种频繁巨模式挖掘算法,即基于核模式合成的频繁巨模式挖掘算法(Core Pattern Fusion Based Colossal Frequent Pattern Mining Algorithm,CPFCFPA),通过将各较小的核模式进行一步合成,寻求一个对频繁巨模式完整集的蕴含集。引入项集编辑距离概念,提出一种新颖的评价频繁巨模式挖掘结果质量的评测模型。实时数据集实验显示,CPFCFPA具有较好的可扩展性和挖掘性能,且对当前频繁模式挖掘算法难以或不能实现的挖掘任务,其挖掘结果能做到对频繁巨模式完整挖掘集的较好近似。Extensive research for frequent pattern mining in the past decade has brought forth a number of pattern mining algorithms that are both effective and efficient. However, the existing frequent pattern mining algorithms encounter challenges at mining rather large patterns, called colossal frequent patterns, in the presence of an explosive number of frequent patterns. Colossal patterns are critical to many applications, especially in domains like bioinformatics. In this study, we investigate a novel mining approach called Core Pattern Fusion (CPF)to efficiently find a good approximation to the colossal patterns. With CPF, a colossal pattern is diseoverod by fusing its small core patterns in one step, whereas the incremental pattern-growth mining strategies, such as those adopted in Apriori and FP-growth, have to examine a large number of mid-sized ones. This property distinguishes CPF from all the existing frequent pattern mining approaches and draws a new mining methodology. Our empirical studies show that, in cases where current mining algorithms cannot proceed, CPF is able to mine a result set which is a close enough approximation to the complete set of the colossal patterns, under a quality evaluation model proposed in this paper.

关 键 词:频繁模式 核模式 模式合成 挖掘算法 项集 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] F222.1[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象