2005年863信息检索评测哈尔滨工业大学信息检索研究室技术报告

Technology Report of HIT.IRLab for Evaluation 2005 of 863 information Retrieval

作　　者：张志昌[1] 张宇[1] 高立琦[1] 袁新成[1] 胡晓光[1] 刘挺[1] 李生[1]

出　　处：《中文信息学报》2006年第B03期83-90,共8页Journal of Chinese Information Processing

基　　金：国家自然科学基金资助项目（60435020,60575042,60503072）

摘　　要：首先用向量空间模型工具Lucene从全部网页正文信息中检索，再用语言模型工具Lemur对结果集进行重排序，然后将两次的结果进行融合，远回融合结果的前1000篇文档作为最终结果集。构造查询输入时，从主题的〈title〉字段和〈dese〉字段选择关键词，并依据tf＊idf的思想对关键词赋予权值。时正式评测的50个主题集检索，获得的三项评价指标为：程序自动构造查询时，MAP=0．3107，P@10=0．624，R-Preeision=0．3672；人工构造查询时，MAP=0．3538，P@10=0．684，R-Preelsion=0．4078。A rough set of relevant results is returned by Lucene, which based on vector space model, after searching all web pages, and is then reranked by Lemur, a language model based tool, to form a second set of relevant results. These two sets are combined by a linear interpolation into one set afterward and the top 1000 pages in it are returned as final results. When formulating queries from topics, key words of queries.are selected from 〈 title 〉 fields and 〈 desc 〉 fileds of topics, and weights of them are calculated using a modified ff ＊ idf method. In the official evaluation on 50 topics, MAP 0. 3107, P@ 10 0. 624, R-Precision 0. 3672 and MAP 0. 3538, P@ 100. 684, R-Precision 0. 4078 are achieved with queries constructed automatically and artificially respectively.

关键词：查询构造向量空间模型语言模型结果融合

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

2005年863信息检索评测哈尔滨工业大学信息检索研究室技术报告

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

2005年863信息检索评测哈尔滨工业大学信息检索研究室技术报告

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索