检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华中科技大学计算机科学与技术学院,湖北武汉430074
出 处:《华中科技大学学报(自然科学版)》2005年第4期7-9,共3页Journal of Huazhong University of Science and Technology(Natural Science Edition)
基 金:湖北省科技攻关项目(2002AA103A06).
摘 要:提出了一种对倒排索引进行压缩的方法,在保证较高压缩率的前提下,对压缩后的数据提供了随机访问的能力.这种方法将压缩后的数据分为两部分,第一部分用来表示单词在子区间的出现次数,第二部分用来表示单词在子区间的具体出现位置,详细描述了检索过程,通过第一部分的信息可以直接对第二部分的任意位置进行解压缩,体现了其随机访问能力,并分析了压缩比和检索效率,讨论了该压缩方法在RDBMS全文检索中的实现,以及如何用表格形式对其进行存储,针对多关键字的检索对算法进行了优化.该实现方法一方面充分利用了数据系统的优点,获得了良好的动态性能,另一方面节省了倒排索引对空间的需求,并提高了检索效率.A method to compress inverted indices with random access capability and high compressibility was proposed. The compressed data were divided into two parts: one part was the counter of the occurrence of the words in sub-areas, the other was the detailed position of the words in these sub-areas. The query process, which can embody the random access capability, was described. The second part could be directly decompressed at certain position according to the data of the first one, and the compressibility and query efficiency were analyzed. The implementation of this compression in full-text information retrieval system of RDBMS(Relational Datbase Management System) was introduced with the storage form of table. The optimization of query algorithm for multi-words was provided. In this implementation, on the one hand the excellent dynamic capability was gained with taking full advantage of RDBMS, on the other hand the demand of storage space was reduced, and query efficiency was enhanced.
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28