基于网页分割的语义信息检索研究

Semantic Information Retrieval Study Based on Page Segmentation

作　　者：沈达峰[1]

出　　处：《西昌学院学报（自然科学版）》2009年第4期57-61,共5页Journal of Xichang University(Natural Science Edition)

摘　　要：如何准确表达用户意图,判断网页与用户需求的相关性是信息检索技术研究的重要方向。本文提出了一种基于网页内容分割的语义信息检索算法。该算法根据网页半结构化的特点,按照HTML标记和网页的内容将网页进行区域分割。在建立HTML标记树的基础上,利用内容相似性和视觉相似性进行节点的整合。根据用户的查询,充分利用区域信息来对相关的检索结果进行排序。实验表明,本文提出的方法可以显著地提高搜索引擎的查询效果。There is an important research direction of information retrieval technology for accurately judging the relations between the web pages and the user＇s requirement. In this paper, a semantic information retrieval algorithm based on web page segment is proposed. The key idea is to segment each web page into different topic areas or segments according to its HTML tags and contents since web pages are semi-structure. First the algorithm builds a HTML tag tree. Then it combines nodes in the tree by using both the content similarity and visual similarity. The retrieval and ranking algorithm makes use of this segmentation information to search and order the relevant pages. Experiment results show that this method is able to improve the search precision significantly.

关键词：网页分割语义信息检索 HTML标记相似性

分类号：TP391.3[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于网页分割的语义信息检索研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于网页分割的语义信息检索研究

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索