检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郭瑞杰[1,2] 程学旗[1] 许洪波[1] 王斌[1] 丁国栋[1]
机构地区:[1]中国科学院计算技术研究所,北京100190 [2]中国科学院研究生院,北京100049
出 处:《计算机研究与发展》2008年第10期1769-1775,共7页Journal of Computer Research and Development
基 金:国家“八六三”高技术研究发展计划基金项目(2006AA010105,2007AA01Z416);国家“九七三”重点基础研究发展规划基金项目(2004CB318109)~~
摘 要:倒排索引的构建可以通过离线方式高效地完成,但是仅当整个数据集索引完毕后方可提供检索服务.在线索引可以在构建倒排索引的同时提供检索服务,新加入的文档即刻可供检索.提出了一种基于动态平衡树的在线索引更新策略,利用动态平衡树控制索引合并过程,使索引合并总是在大小相近的子索引之间进行,以减少索引合并代价,同时可以调节索引和检索之间的性能平衡.该方法提供了一个基于合并的在线索引更新框架,与已有方法相比具有更好的通用性、更高的性能和更好的规模可扩展性.在由4000万张网页构成的270GBWeb数据集上运行的实验表明,该方法在实际系统中是高效的,将索引更新的性能提高了92.28%,而检索性能仅下降4.79%,大幅度降低了在线索引构建的代价.Examined in this paper is the issues of on-line maintenance for growing text collections. This approach builds inverted index to reflect changes in the collection it describes, supporting efficient document insertions and deletions. A fast on-line index construction method based on dynamic balancing tree is proposed. This method takes a merge-based approach and aims to keep a good performance during indexing and query processing by always merging indices with similar sizes. This is achieved by using a tree in which each node corresponds to one sub-index. The placing of sub-indexes in the tree depends on the sizes of the sub-indexes, so that each merging takes place among similarly sized sub-indexes. The method is suited very well not only for growing text collections, but also for dynamic text collections, and is a uniform framework for merge-based index maintenance strategies. It is more general, faster and scales better than previous methods, and can be adapted to suit different balances of insertion and querying operations. Using experiments on large scale Web data, the efficiency, flexibility and scalability of this method are demonstrated in practice, showing that on-line index construction for growing text collections can be performed efficiently and almost as fast as for static text collections.
关 键 词:信息检索 在线索引 索引性能 检索性能 动态平衡树
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145