检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄利斌 陈慧 HUANG Li-bin;CHEN Hui(School of Information Science and Technology, Hunan Agricultural University, Changsha 410128,China)
出 处:《电脑知识与技术》2019年第2期160-162,共3页Computer Knowledge and Technology
摘 要:主题爬虫已经成为当下信息采集的重要方式。传统的主题爬虫技术,主题词与其相关性权重是固定不变的,因此,存在随着爬取页面的增加而爬准率下降,错误率上升的问题。本文采用的主题爬虫技术,运用BP神经网络,根据下载网页的特征,动态更新主题词与其相关性权重,从而实现随着爬取页面的增加而爬准率上升,错误率下降。基于BP神经网络的主题爬虫技术,能提高信息采集的效率,降低因采集错误而产生的损失。Theme crawler has been an important way of obtaining modern information. For traditional theme crawler technology, the theme words and its relevance weights are fixed, which is a problem that the crawl rate decreases and the error rate increases as the number of crawling pages increases. Therefore, we propose a theme crawler technology based on BP neural network, which can dy. namically update keywords and their relevance weights according to the characteristics of the downloaded webpage.Intelligent the. matic crawler technology based on BP neural network can improve the efficiency of information collection and reduce the loss caused by the acquisition error.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.147