检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:何慧敏[1,2,3] 刘燕兵[1,2,3] 谭建龙[1,3] 郭莉[1,3]
机构地区:[1]中国科学院计算技术研究所,北京100190 [2]中国科学院研究生院北京,100049 [3]信息内容安全技术国家工程实验室,北京100190
出 处:《计算机应用与软件》2011年第11期10-14,56,共6页Computer Applications and Software
基 金:国家自然科学基金项目(61070026);国家重点基础研究发展计划基金项目(2007CB311100)
摘 要:多模式串匹配算法是网络内容过滤系统的核心技术。巨大的存储空间开销是制约多模式匹配串算法应用的瓶颈之一。提出一种基于子串识别的多模式匹配算法—HashBOM,该算法利用位哈希表存储模式串的子串信息以大幅度减少存储空间,利用递归哈希函数计算字符串的哈希值以实现快速匹配。理论分析表明,该算法的空间复杂度为O(rm^2),优于基于子串识别的匹配算法BOM的空间复杂度O(mr|∑|log_2mr);该算法搜索匹配过程的平均时间复杂度为O(nlog|∑|)mr/m,与BOM算法相同(其中m为最短模式串的长度,r为模式串的个数,n为待匹配文本的长度,|∑|为字母表的大小)。在随机数据集和真实数据集上的实验表明,该算法的存储空间远远低于BOM算法,而匹配速度与BOM算法相当,非常适合在线实时匹配的应用环境。The multiple patterns string matching algorithm is a key technology in network content filtration systems. However, the enormous storage consumption is the bottleneck that prevents the multiple patterns matching string algorithm from being applied. The paper proposes a substring recognition based multiple patterns matching algorithm--HashBOM. The algorithm uses bit hash map to store the substring information of pattern strings in order to significantly reduce storage space and uses recursive hash functions to calculate strings' hash values to realize rapid matching. Theoretical analysis demonstrate that the space complexity of the algorithm is 0( rm^2 ) which is superior to substring recognition based matching algorithm, BOM, whose space complexity is 0 (mr|∑|log2 m); the algorithm's average time complexity for search matching process ismr|∑|mr/mwhich is as same as BOM, where m stands for minimal pattern string length,r stands for the number of the pattern strings,n stands for the length of the to-be-matched text, and |∑| stands for the size of the alphabet. Experiments on both synthetic dataset and real-world dataset show that the proposed algorithm runs as fast as BOM whereas consumes far less storage space so that it is very suitable for online real-time matching application environments.
关 键 词:多模式串匹配算法 位哈希表 递归哈希函数 空间压缩
分 类 号:TP301.5[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46