基于领域模型的网页搜索排序算法  被引量:2

Web Page Re-Ranking Algorithm for Specific Domain Based on Domain Model

在线阅读下载全文

作  者:潘澄[1] 吴共庆[1] 李磊[1] 胡学钢[1] 

机构地区:[1]合肥工业大学计算机与信息学院,合肥230009

出  处:《计算机系统应用》2015年第11期107-114,共8页Computer Systems & Applications

基  金:国家高技术研究发展计划(863)(2012AA011005)

摘  要:通用搜索引擎在检索过程中会出现查询结果与关键词所属领域无关的主题漂移现象.本文提出了面向特定领域的网页重排序算法—TSRR(Topic Sensitive Re-Ranking)算法,从一个新的视角对主题漂移问题加以解决.TSRR算法设计一种独立于网页排序的模型,用来表示领域,然后建立网页信息模型,在用户检索过程中结合领域向量模型和网页信息模型对网页搜索结果进行重排序.在爬取的特定领域的数据集上,以用户满意度和准确率为标准进行评估,实验结果表明,本文中提出的TSRR算法性能优异,比经典的基于Lucene的排序算法在用户满意度上平均提高17.3%,在准确率上平均提高41.9%.General search engines often cause the topic-drift problem, which means that during the retrieval process, some of the retrieval results are independent to the domain keywords. We propose a web page re-ranking algorithm for a specific domain--the TSRR(Topic Sensitive Re-Ranking) algorithm to solve the problem from a specific perspective. TSRR establishes a vector model which is independent to page rank for a specific domain and a web page information model; then it combines the vector model and the web page information model to re-rank the search results in the retrieval process. TSRR's performance is evaluated based on the criteria of customer satisfaction and precision. Experiment results on the dataset crawled for specific domains show that TSRR is excellent in performance. Compared with the ranking algorithm from Lucene, TSRR can promote the customer satisfaction performance by 17.3% and the precision performance by 41.9% on average.

关 键 词:领域模型 网页信息模型 网页重排序 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TP393.092[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象