一种有效的字符串有序跳跃模式近似匹配算法  被引量:2

Efficient Skip-Pattern Matching Algorithm for Approximate String Sequential Problem

在线阅读下载全文

作  者:沈洲[1] 王永成[1] 刘功申[1] 

机构地区:[1]上海交通大学电子信息学院,上海200030

出  处:《数据采集与处理》2001年第4期459-465,共7页Journal of Data Acquisition and Processing

基  金:国家 8 6 3计划 (编号 :86 3- 30 6 - ZD0 3- 0 4 - 1)资助项目

摘  要:字符串的模式匹配问题是计算机科学的基本问题之一 ,而近似模式匹配更是近期的研究热点。本文分析了文本分析领域中出现的一种特殊的近似模式匹配问题 ,即字符串有序跳跃模式近似匹配问题 ,提出了一种基于有限自动机的组件组合分析算法。算法的特点在于将组件匹配过程与组配过程进行分离 ,这样既降低了问题的复杂度 ,又可以实现按策略组配的灵活性。组件匹配过程中利用有限自动机对跳跃模式的组件进行匹配查找 ;组件的组配过程中先对查找到的组件进行组合分析 ,然后再对各种组合进行初步筛选和基于策略的优选。初步筛选工作是依据顺序性、唯一性和最大数三条原则进行 ;而优选工作是根据四个设计的评价参数选择其中最佳组合。实验结果表明 ,该算法的确能解决字符串有序跳跃模式匹配问题 。The string pattern matching is always viewed as a fundamental problem in computer science. Furthermore, the approximate pattern matching has attracted more interest recently and now becomes a new research field. This paper analyzes a kind of special approximate pattern matching problem i.e. sequential skip pattern matching in the field of text analysis. A component combinatorial analysis algorithm based on the deterministic state automata is presented. The main feature of the algorithm is that the process of component matching and assembling is separated. So, it not only decrease the complexity of problem, but also provide the flexible solution for assembling component depended on policy. The automata is used to search the components of the patterns in the process of matching component. In the process of assembling the components, combinatorial analysis is done firstly for the found components, and preliminary selection and further selection depended on po licy are made secondly. The preliminary selection is based on three principles: sequential rule, sole rule, and maximum occurrence number rule. And the further selection is based four parameters of evaluation to select the best result. Experiments demonstrate that the algorithm can solve such approximate string matching problem, and it can be used in the filed of sentence pattern matching and skip word matching.

关 键 词:有限状态自动机 字符串 模式匹配 计算机 有序跳跃模式 

分 类 号:TP391.43[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象