一种基于尾匹配q-gram的近似串匹配算法被引量：1

Approximate String Matching Using Tail Matched q-gram

机构地区：[1]渤海大学信息科学与技术学院,锦州121013 [2]渤海大学大学计算机教研部,锦州121013

出　　处：《计算机科学》2014年第6期243-249,共7页Computer Science

基　　金：辽宁省社科联2014年度辽宁经济社会发展立项重点课题(2014lslktzdian-04);国家自然科学基金项目(61173142;61202462);辽宁省教育厅一般项目(L2013422;L2012397);辽宁省"百千万人才工程"项目(2012921058)资助

摘　　要：近似串匹配是生物信息学、文本检索、信号处理等领域的一个基础问题,如何提高近似串匹配的速度一直都是研究的关键问题。提出一种新的在大文本库中快速查找近似匹配的无损过滤算法。为保证在大文本库中的匹配速度,本算法使用了查询速度较快的q-gram索引。为通过提高过滤算法的过滤效率达到提升算法整体性能的目的,详细分析了含有匹配串的文本区域,提取了一些基于尾匹配q-gram特征的新过滤条件,然后用这些特征优化了过滤算法的过滤标准。实验数据表明,新过滤条件有效地提高了算法的过滤效率,提升了算法的整体性能。结果显示新算法适合各种匹配错误率下的近似匹配,算法的通用性较强。Approximate string matching is a basic problem in computational biology,text retrieval and signal processing,etc.,and how to improve the matching speed is a key issue up to now.Here,a new lossless q-gram filter was proposed to enhance the performance of finding true matches in a large text database for a given query.Q-gram index is employed as the index structure of large text database for its fast searching speed.To improve new filter＇ s performance by enhancing its filtration efficiency,some new features based on tail matched q-gram were extracted from text that includes true matches.In new filter,more irrelevant text is eliminated in filtration phase by using a filtration criterion which is enhanced by new extracted features,and the unfiltered text is verified by smith-waterman algorithm to locate the positions of all true matches.The experimental results demonstrate that the proposed filter＇s filtration efficiency and performance are both enhanced.As a result,new filter algorithm has strong commonality and is suitable for approximate string matching on condition of various matching error ratio.

关键词：近似串匹配过滤算法 q-gram过滤 q元语法

分类号：TP391.3[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于尾匹配q-gram的近似串匹配算法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于尾匹配q-gram的近似串匹配算法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于尾匹配q-gram的近似串匹配算法被引量：1