检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《中文信息学报》2003年第1期25-31,共7页Journal of Chinese Information Processing
基 金:国家自然科学基金资助项目 (6 0 0 82 0 0 3)
摘 要:定题检索将信息检索限定在特定主题领域 ,提供主题领域内信息的检索服务。它是新一代搜索引擎的发展方向之一。定题检索的关键技术是主题相关信息的搜索。本文提出了基于遗传算法的定题信息搜索策略 ,提高链接于内容相似度不高的网页之后的页面被搜索的机会 ,扩大了相关网页的搜索范围。同时 ,借助超链Metadata的提示信息预测链接页面的主题相关度 ,加快了搜索速度。对比搜索试验证明了算法具有较好的性能。The exponential growth of information available on the WWW makes it increasingly difficult to crawl and index the entire internet for general-purpose crawlers.Rather than collecting and indexing all accessible web documents to answer all possible ad-hoc queries,focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl,and avoids irrelevant regions of the Web.In this paper,a new focused crawling approach based on Generic Algorithm is proposed.The method electively seeks out pages that are relevant to a pre-defined set of topics using Generic Algorithm,increases the crawling chance of the web page following the web page with the low content-relevance,and broadens the relevant-searching scope of crawlers.Meanwhile,the hyperlink metadata is used to predict the topic-relevance of the web page pointed and quickens the information crawling.Experimental results indicate that our approach has better performance.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.40