检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:戴东波[1] 汤春蕾[1] 邱伯仁[1] 熊赟[1] 朱扬勇[1]
机构地区:[1]复旦大学计算机科学技术学院,上海200433
出 处:《计算机研究与发展》2010年第10期1785-1796,共12页Journal of Computer Research and Development
基 金:上海市重点学科建设基金项目(B114);上海市科委基金项目(08511500203)~~
摘 要:序列数据一类重要的数据类型,在文本、Web访问日志文件、生物数据库等应用中普遍存在,对其进行相似性查询是一种获取有用信息的重要手段.在大型序列数据库中进行高效相似性查询的关键因素之一就是查询算法的过滤能力,即设计能快速过滤与查询序列不相关序列集的过滤器十分重要.提出了结合序列距离的度量性质和序列自身特征的多重过滤算法SSQ_MF,SSQ_MF使用了长度过滤器、前缀过滤器和基于参考集的过滤器,使得算法过滤能力较基于单一过滤器算法进一步增强.此外,设计了有关数据结构对查询数据库的一些统计信息进行了预计算和保存,有效估计了各过滤器的过滤集大小,并构建了一个由过滤集大小确定的最优过滤顺序模型,使得算法的过滤代价最低.实验结果表明,算法SSQ_MF的查询性能优于单一过滤器算法和随机过滤顺序的多过滤器算法.Sequence data is an important data type,ubiquitous in many domains such as text,Web access log and biological database.Similarity query in this kind of data is a very important means for extracting useful information.One key factor for high performance of similarity query in huge sequence database is the filtering level of query algorithm,namely,designing those filters that can quickly filter out the unpromising strings for query string is very important.Combining metric property of sequences' distance with sequences' characteristics,an algorithm called SSQ_MF based on multiple filters is proposed,whose filtering level is further improved compared with those query algorithms with a single filter.The filters used in SSQ_MF are length filter,prefix filter,and reference-based filter.Then,the statistical information about the string database is pre-computed and some data structures are used to store those information.Furthermore,every filter's filtering size is effectively estimated and a model in which optimal filtering order is determined by every filter's filtering size is built.Comprehensive experimental results demonstrate that in terms of query performance,SSQ_MF is better than algorithms with a single filter and algorithms with multiple filters but executing in a random order.
关 键 词:序列数据 相似性查询 过滤器 过滤顺序 度量空间
分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.16.216.138