基于对gSpan改进的有向频繁子图挖掘算法  被引量:2

Digraph frequent subgraph mining based on gSpan

在线阅读下载全文

作  者:周溜溜[1] 业宁[1] 

机构地区:[1]南京林业大学信息技术学院,南京210037

出  处:《南京大学学报(自然科学版)》2011年第5期532-543,共12页Journal of Nanjing University(Natural Science)

基  金:国家自然科学基金(30671639);江苏省自然科学基金(BK2009393);江苏省青蓝工程学术带头人项目

摘  要:提出的新算法对gSpan算法做了适用性改进,算法所采用的图编码技术与传统的频繁子图挖掘(FSG),快速频繁子图挖掘(FFSM),基于先验的图挖掘(AGM)等算法对图结构的编码均不同,由于对有向图进行了新的二维特征定义,因此可使算法适用范围有效地扩展至对有向图的学习,称之为基于对gSpan改进的有向频繁子图挖掘算法(DFSS);因目前为止,一系列频繁子图的挖掘大都是基于无向图上的知识发现,对直接作用于有向图的挖掘尚且很少.并且所设计算法较先前基于Apriori思想的FSG,AGM等一系列频繁图挖掘算法,在时间复杂度方面有了一定程度的改进,使得挖掘效率得以提升;实验结果表明在不损失挖掘完整度的前提下,其效率是FFSM算法的70~80倍.With graph data generated from various sources,meaningful pattern mining on this kind of data set becomes more and more urgent,especially along with the development of life science,yielding a considerable amount of directed graphs.However,existed related algorithms are all designed for undirected graphs,like Frequent Subgraph Discovery(FSG),Apriori-based Graph Mining(AGM),Fast Frequent Subgraph Mining(FFSM) and so on.Except for FFSM,there is not such an algorithm specially designed for directed graphs.Meanwhile,FFSM algorithm itself also needs some modifications ahead of utilizing on directed graphs.In the paper,we propose a new algorithm called DFSS after the research based on some pattern mining algorithms of graphs,which detects frequent substructures directly in directed graphs.DFSS constructs a new data model to indicate each graph and maps each one into a unique minimum array code.Two new concepts,link degrees and level degree are presented in the paper for the special operation of directed graph pattern mining.We also analyze the efficiency of FFSM and DFSS.The time complexity of DFSS is O(n3·2n),which improves O(m4/n2) times compared to FFSM,where n is the number of frequent edges and m is the number of frequent vertices in the graph data set.Experiments showed that DFSS achieves orders of magnitude speedup in comparison with FFSM,and demonstrate theoretical analysis mentioned above.In addition,the algorithm proposed in this paper plays the role as the first to directly operate on the directed graph data set,which will surely have vital significance for future related works.

关 键 词:有向图挖掘 gSpan 频繁子图 适用性扩展 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象