Optimizing top-k retrieval: submodularity analysis and search strategies  被引量:1

Optimizing top-k retrieval: submodularity analysis and search strategies

在线阅读下载全文

作  者:Chaofeng SHA Keqiang WANG Dell ZHANG Xiaoling WANG Aoying ZHOU 

机构地区:[1]School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 200433, China [2]Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China [3]Department of Computer Science and Information Systems, Birkbeck, University of London, London WCIE 7HX, UK

出  处:《Frontiers of Computer Science》2016年第3期477-487,共11页中国计算机科学前沿(英文版)

基  金:This work was supported by the National Natural Science Foundation of China (Grant Nos. 61572135 and 61170085), 973 project (2010CB328106), Program for New Century Excellent Talents in China (NCET-10-0388).

摘  要:The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user's query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k re- trieval problem in the framework of facility location analysis and prove he submodularity of that objective function which provides a theoretical approximation guarantee of factor 1 -1/ε for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first ob- tains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user's query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k re- trieval problem in the framework of facility location analysis and prove he submodularity of that objective function which provides a theoretical approximation guarantee of factor 1 -1/ε for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first ob- tains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.

关 键 词:top-k retrieval DIVERSIFICATION submodular function maximization 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP311.13[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象