检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:罗元[1]
出 处:《现代计算机》2013年第10期3-7,12,共6页Modern Computer
摘 要:随着互联网的快速发展与搜索引擎的广泛使用,网页数据已经成为各种应用与研究的重要数据源之一。然而由于网页的特殊性,它所包含的信息并非都是各种应用所必需,例如:广告,导航条等。它们的存在会对各种应用产生不利影响。此外,网页检索结果中经常出现内容相同的冗余页面的问题。所以在网页数据的应用过程中网页净化、网页去重是一个基础问题,也是目前研究的一个热点问题。所以很有必要对网页净化和网页去重领域进行总结,以便更好地深入研究。从网页净化、去重的必要性出发,对它们进行定义和分类,概述多种网页净化、去重的方法和框架,并对其进行总结。With the rapidly development of Internet and widely use of search engine, web data became the major source of date for lots of research and web applications. However, due to the particularity of web page, the information it contains is not necessary for variety of applications, such as ad- vertising, navigation bar. They will have adverse effects to variety of applications.In addition, there is another problem that the Web search results often contain redundant pages. Therefore, in the process of pages of data application, page purification and deduplicationis are a basic problem, and it's also a hot issue in the present study. Thus it is necessary to sum up fields on the page de-noise anddeduplication, in order to carry out in-depth study better. Firstly, this pa- per gives a brief introduction to the necessity of Web page purification and deduplication. Then, this paper presents a classification hierarchy of the Web page purification methods and Web page deduplication methods, discusses the existing problems and the future directions in the fields. W
分 类 号:TP393.092[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.171