带通配符的模式匹配问题及其解空间特征分析  被引量:1

Characteristic Analysis of Pattern Matching with Wildcards Problem and its Solution Space

在线阅读下载全文

作  者:项泰宁 郭丹[1] 王海平[1] 胡学钢[1] 

机构地区:[1]合肥工业大学计算机与信息学院,合肥230009

出  处:《计算机科学》2014年第9期269-273,310,共6页Computer Science

基  金:国家自然科学基金:港澳学者合作研究基金项目(61229301);国家自然科学基金(61305062);博士后面上基金项目(2012M511403);安徽省自然科学基金(1308085QF102)资助

摘  要:随着生物信息学、信息检索等领域的发展,带有通配符和长度约束的模式匹配问题引起了广泛关注。该问题扩展了精确模式匹配问题,使匹配更加灵活,同时也增加了匹配的复杂性,极大地提高了非线性匹配算法的复杂度。求解该问题的匹配算法的效率与问题的解空间密切相关,而目前针对该问题的解空间及其特征尚缺乏系统的研究。鉴于此,描述了该问题的解空间,并分析了解空间的可分性。之后,提出解空间划分算法SPLIT,并分析了SPLIT的时间复杂性。实验部分以3个匹配算法为对照,在真实DNA数据集下,使用了5109组模式。实验结果表明,SPLIT不影响匹配解的结构,且可以有效降低非线性匹配算法的时间消耗。With the developments in bioinformatics and information retrieval, pattern matching with wildcards and length constraints has attracted wild attention. This problem extends the exact pattern matching, so it brings more flexibility as well as complexity. Meanwhile, the time complexity of non-linear algorithms is greatly increased. The solution space makes great difference to the efficiency of matching algorithms, but the characteristics of the problem solution space is still lack of systematic research. Therefore, the problem solution space was presented, and the divisibility of the space was analyzed. Afterwards, a solution space partitioning algorithm, named SPLIT, was proposed, and its time complexity was also illustrated. In our experiments, three pattern matching algorithms were used as baselines running on real DNA data sets with 5109 sets of patterns. Our empirical results verify that SPLIT can effectively reduce the time consumption of non-linear matching algorithms without influencing the matching solution.

关 键 词:解空间 分割 模式匹配 通配符 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象