检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]合肥工业大学计算机与信息学院,合肥230009
出 处:《计算机科学》2014年第9期269-273,310,共6页Computer Science
基 金:国家自然科学基金:港澳学者合作研究基金项目(61229301);国家自然科学基金(61305062);博士后面上基金项目(2012M511403);安徽省自然科学基金(1308085QF102)资助
摘 要:随着生物信息学、信息检索等领域的发展,带有通配符和长度约束的模式匹配问题引起了广泛关注。该问题扩展了精确模式匹配问题,使匹配更加灵活,同时也增加了匹配的复杂性,极大地提高了非线性匹配算法的复杂度。求解该问题的匹配算法的效率与问题的解空间密切相关,而目前针对该问题的解空间及其特征尚缺乏系统的研究。鉴于此,描述了该问题的解空间,并分析了解空间的可分性。之后,提出解空间划分算法SPLIT,并分析了SPLIT的时间复杂性。实验部分以3个匹配算法为对照,在真实DNA数据集下,使用了5109组模式。实验结果表明,SPLIT不影响匹配解的结构,且可以有效降低非线性匹配算法的时间消耗。With the developments in bioinformatics and information retrieval, pattern matching with wildcards and length constraints has attracted wild attention. This problem extends the exact pattern matching, so it brings more flexibility as well as complexity. Meanwhile, the time complexity of non-linear algorithms is greatly increased. The solution space makes great difference to the efficiency of matching algorithms, but the characteristics of the problem solution space is still lack of systematic research. Therefore, the problem solution space was presented, and the divisibility of the space was analyzed. Afterwards, a solution space partitioning algorithm, named SPLIT, was proposed, and its time complexity was also illustrated. In our experiments, three pattern matching algorithms were used as baselines running on real DNA data sets with 5109 sets of patterns. Our empirical results verify that SPLIT can effectively reduce the time consumption of non-linear matching algorithms without influencing the matching solution.
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145