2005年863信息检索评测哈尔滨工业大学信息检索研究室技术报告  

Technology Report of HIT.IRLab for Evaluation 2005 of 863 information Retrieval

在线阅读下载全文

作  者:张志昌[1] 张宇[1] 高立琦[1] 袁新成[1] 胡晓光[1] 刘挺[1] 李生[1] 

机构地区:[1]哈尔滨工业大学信息检索研究室,黑龙江哈尔滨150001

出  处:《中文信息学报》2006年第B03期83-90,共8页Journal of Chinese Information Processing

基  金:国家自然科学基金资助项目(60435020,60575042,60503072)

摘  要:首先用向量空间模型工具Lucene从全部网页正文信息中检索,再用语言模型工具Lemur对结果集进行重排序,然后将两次的结果进行融合,远回融合结果的前1000篇文档作为最终结果集。构造查询输入时,从主题的〈title〉字段和〈dese〉字段选择关键词,并依据tf*idf的思想对关键词赋予权值。时正式评测的50个主题集检索,获得的三项评价指标为:程序自动构造查询时,MAP=0.3107,P@10=0.624,R-Preeision=0.3672;人工构造查询时,MAP=0.3538,P@10=0.684,R-Preelsion=0.4078。A rough set of relevant results is returned by Lucene, which based on vector space model, after searching all web pages, and is then reranked by Lemur, a language model based tool, to form a second set of relevant results. These two sets are combined by a linear interpolation into one set afterward and the top 1000 pages in it are returned as final results. When formulating queries from topics, key words of queries.are selected from 〈 title 〉 fields and 〈 desc 〉 fileds of topics, and weights of them are calculated using a modified ff * idf method. In the official evaluation on 50 topics, MAP 0. 3107, P@ 10 0. 624, R-Precision 0. 3672 and MAP 0. 3538, P@ 100. 684, R-Precision 0. 4078 are achieved with queries constructed automatically and artificially respectively.

关 键 词:查询构造 向量空间模型 语言模型 结果融合 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象