检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马晓文[1] 胡学钢[1] 谢飞[1,2] 郭丹[1]
机构地区:[1]合肥工业大学计算机与信息学院,合肥230009 [2]合肥师范学院计算机科学与技术系,合肥230601
出 处:《南京大学学报(自然科学版)》2013年第2期226-234,共9页Journal of Nanjing University(Natural Science)
基 金:国家"863"计划(2012AA011005);国家自然科学基金(60975034);安徽省自然科学基金(11040606M134)
摘 要:带有通配符的多序列模式挖掘在文本检索、网络安全、生物科学等领域中具有很重要的作用.通过挖掘多序列模式,能够透彻的了解序列之间的联系,在各个领域中具有重要的现实意义.在已有的工作中,随着多序列集长度的增大,挖掘的规模呈现指数级增长.研究这样一个问题:给定多条序列s1,…,sn,支持度阈值和间隔约束,从多序列中挖掘所有出现次数不小于给定支持度阈值的频繁序列模式,并且要求模式中任意两个相邻元素在序列中的出现位置满足用户定义的间隔约束.设计了一个有效的算法M-OneOffMine,模式在序列中的出现满足one-off条件.在生物DNA序列上的实验结果表明,M-OneOffMine算法比相关的序列模式挖掘算法具有更好的时间性能.Mining multi-sequential patterns with gap constraints is an important research task in many domains,such as text retrieval,network security,and biological science.In the previous work,with the length of the multi-sequence increasing,the mining scale presents exponential increasing,and those algorithms merely mined patterns with the limited length.Given the sequences s1,…,sn,a certain threshold,and gap constraints,we aim to discover frequent patterns whose supports in multiple sequence are no less than the given threshold value.There are flexible wildcards in pattern P,and the number of the wildcards between any two successive elements of P fulfills the user-specified gap constraints.In this paper,we design an efficient mining algorithm,named M-OneOffMine that satisfies the one-off condition under which each character in the given sequence can be used at most once in all occurrences of a pattern.The experiments on DNA sequences show that M-OneOffMine has better time performances than the related algorithms.The time and space complexities of M-OneOffMine are respectively O(kmnlw)and O(k(l+n)),where m is the number of frequent patterns,k is the number of element sequences,n is the length of the pattern,l is the length of the multiple sequence,and w is the flexibility of the gap constraint.
关 键 词:多序列 间隔约束 通配符 one-off条件 频繁模式
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117