倒排索引压缩及在RDBMS全文检索中的实现  被引量:3

Compression of inverted and implementation in full-text information retrieval system RDBMS

在线阅读下载全文

作  者:朱虹[1] 吴林[1] 

机构地区:[1]华中科技大学计算机科学与技术学院,湖北武汉430074

出  处:《华中科技大学学报(自然科学版)》2005年第4期7-9,共3页Journal of Huazhong University of Science and Technology(Natural Science Edition)

基  金:湖北省科技攻关项目(2002AA103A06).

摘  要:提出了一种对倒排索引进行压缩的方法,在保证较高压缩率的前提下,对压缩后的数据提供了随机访问的能力.这种方法将压缩后的数据分为两部分,第一部分用来表示单词在子区间的出现次数,第二部分用来表示单词在子区间的具体出现位置,详细描述了检索过程,通过第一部分的信息可以直接对第二部分的任意位置进行解压缩,体现了其随机访问能力,并分析了压缩比和检索效率,讨论了该压缩方法在RDBMS全文检索中的实现,以及如何用表格形式对其进行存储,针对多关键字的检索对算法进行了优化.该实现方法一方面充分利用了数据系统的优点,获得了良好的动态性能,另一方面节省了倒排索引对空间的需求,并提高了检索效率.A method to compress inverted indices with random access capability and high compressibility was proposed. The compressed data were divided into two parts: one part was the counter of the occurrence of the words in sub-areas, the other was the detailed position of the words in these sub-areas. The query process, which can embody the random access capability, was described. The second part could be directly decompressed at certain position according to the data of the first one, and the compressibility and query efficiency were analyzed. The implementation of this compression in full-text information retrieval system of RDBMS(Relational Datbase Management System) was introduced with the storage form of table. The optimization of query algorithm for multi-words was provided. In this implementation, on the one hand the excellent dynamic capability was gained with taking full advantage of RDBMS, on the other hand the demand of storage space was reduced, and query efficiency was enhanced.

关 键 词:全文检索 倒排索引 索引压缩 编码 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象