检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]哈尔滨工业大学计算机科学与技术学院,哈尔滨150001
出 处:《哈尔滨工业大学学报》2014年第7期8-13,共6页Journal of Harbin Institute of Technology
基 金:国家自然科学基金资助项目(61173024);广东省部产学研结合基金资助项目(2011A090200037)
摘 要:为了解决传统分布式搜索引擎存在的搜索性能问题,从索引结构、查询算法方面改进了传统模型.提出了一种非集中的高并行化搜索模型,该模型按照文档主题对索引分类,对较长的倒排记录表采用位图结构,利用多线程技术对索引节点实现并行搜索算法(multi max score heap,MMSH).实验结果表明:改进模型中的索引分类方法与倒排表结构的位图策略,能够增强Merge层查询的针对性,降低Merge层节点的CPU和内存开销;在倒排表不能完全存入内存情况下,MMSH算法能够实现高度并行化查询,其查询效率高于经典的term-at-a-time算法,缩短了平均查找时间,提高了系统吞吐量.索引分类、位图结构以及并行查询算法能够避免查询的盲目性,改善了分布式搜索引擎的性能.To solve the problem of search performance in traditional distributed search engine, a non-centralized high parallelization search model was proposed and the traditional model was improved in the index structure and search algorithm. In the model, the index was classified according to document theme, bitmap structure was employed for longer inverted record list, and parallel search algorithm ( multi max score heap, MMSH) was achieved in index node by using multi-threading technology. Experimental results show that the improved search model with index classification and bitmap strategy of the inverted list structure can enhance the search pertinence in Merge layer, reduce CPU and memory cost. In the case that the inverted list can not be completely stored in memory, MMSH algorithm can implement highly parallel search and its query efficiency is higher than the classical term-at-a-time algorithm, which shortens the average search time and improves the system throughput. Index classification, bitmap structure and parallel query algorithm can avoid query blindness and improve the performance of distributed search engines.
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.79