周期性一般间隙约束的序列模式挖掘  被引量:12

Mining Sequential Patterns with Periodic General Gap Constraints

在线阅读下载全文

作  者:武优西[1] 周坤[1] 刘靖宇[1] 江贺[2] 吴信东[3,4] WU You-Xi ZHOU Kun LIU Jing-Yu JIANG He WU Xin-Donga(School of Computer Science and Engineering, Hebei University of Technology, Tianjin 300401 School of Software, Dalian University of Technology, Dalian, Liaoning 116621 School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009 Department of Computer Science, University of Vermont, Burlington 05405, USA)

机构地区:[1]河北工业大学计算机科学与软件学院,天津300401 [2]大连理工大学软件学院,辽宁大连116621 [3]合肥工业大学计算机科学与信息工程学院,合肥230009 [4]佛蒙特大学计算机系佛蒙特州伯灵顿美国05405

出  处:《计算机学报》2017年第6期1338-1352,共15页Chinese Journal of Computers

基  金:国家自然科学基金(61229301);教育部创新团队项目(IRT13059);河北省自然科学基金(F2013202138);河北省教育厅重点项目(ZD2014009);河北省教育厅青年基金(QN2014192)资助~~

摘  要:序列模式挖掘是从给定序列中发现出现频率高的模式的一种方法,目前已在诸多领域被广泛应用.假定子模式p_i和p_j(i<j)可以分别匹配事件A和事件B,传统的序列模式挖掘方法能够对事件B在事件A之后的序列进行检测,而不能对事件B发生在事件A之前的序列进行识别.为了解决此问题,文中提出了周期性一般间隙约束的序列模式挖掘问题,该问题具有如下5个特点:间隙约束的最小值可为负值的一般间隙约束;每个间隙约束都相同的周期性模式;在支持数统计方面无特殊约束,即允许序列中事件多次使用;该挖掘问题满足Apriori性质;挖掘支持率大于给定的频繁度阈值的频繁模式.为了进行有效地挖掘,采用深度优先的方式建立模式树.文中采用模式匹配技术,在一遍扫描序列数据库的情况下,建立其所有超模式的不完整网树森林(不完整网树是网树的最后一层结点,可以存储在一个数组中,可以有效地表示一个模式在一个序列中的支持数),并对这些超模式的支持率进行有效地计算,进而挖掘出所有频繁模式,有效地提高了序列模式挖掘速度.实验结果验证了文中算法的可行性和有效性.Sequential pattern plays an essential role in many Pi and Pj (i〈j) can match even the sequences in which event B mining is to discover the frequent patterns in the sequences and critical data mining tasks with broad applications. Given sub-patterns ts A and B respectively, traditional pattern mining methods can detect is after event A, but fail to find the sequences with event B occurring before event A. To tackle this challenge, in this paper, we propose sequential pattern mining with periodic general gap constraints with five characteristics as follows. The minimal gap constraint, namely general gap constraint, can be a negative value. All gap constraints of the pattern are the same. Any event in the sequence can be used more than once in different supports. The problem satisfies the Apriori property under the new definition of offset sequences. If the support ratio of a pattern is greater than the given threshold, the pattern is a frequent pattern. To solve the problem effectively, a depth-first search strategy is used to create the pattern tree. An Incom- plete Nettree structure, the last layer of a Nettree which can be stored in an array, can represent the support of a pattern in the sequence. Pattern matching method is used to create the Incomplete Nettrees for all super patterns of a frequent pattern by one way scan over the sequence database. Therefore, we can calculate the support ratios of these super patterns and find the frequent ones. Experimental results validate the feasibility and effectiveness of the proposed algorithms.

关 键 词:序列模式挖掘 一般间隙 频繁模式 模式匹配 APRIORI性质 人工智能 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象