检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]浙江工业大学信息工程学院,浙江杭州310032 [2]浙江工业大学软件学院,浙江杭州310032
出 处:《浙江工业大学学报》2009年第5期495-498,共4页Journal of Zhejiang University of Technology
摘 要:目前,搜索引擎以整张网页作为最小处理单位进行排序处理,容易受到噪音信息的干扰.针对存在的问题,提出用网页分块对网页净化,进而利用净化结果改进传统的排序算法.首先,用基于视觉的网页分块算法VIPS将网页分成若干语义块,然后通过设定规则保留网页中与主题相关度高的语义块,最后用这些语义块代表整个网页参与检索,减少网页噪音对搜索引擎排序算法正确性的影响,实现了检索质量的改进.最后通过实验证明了改进算法的优越性.At present, an entire webpage is used as sorting unit in the search engine. This method is vulnerable to noise interference. In order to overcome the problem, the webpage segmentation method is proposed to purify the webpage in this paper. The purified webpage is used to improve the sorting algorithm in the search engine. Firstly, the webpage segmentation algorithm VIPS based on vision is used to divide the webpage into several semantic blocks. Then, the semantic blocks with highly relevant to the subject is reserved through setting rules. Finally, these semantic blocks on behalf of the entire webpage will be used in the search engine. It effectively reduces the impact of noise on the sorting algorithm in webpage search engine and improves the search quality. The experiment shows that the strategies proposed in this paper are practical.
关 键 词:网页噪音 网页分块 网页净化 排序算法 VIPS
分 类 号:TN393.09[电子电信—物理电子学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38