基于大数据的深度学习网络爬虫算法在信息搜集与处理中的应用

Application of Deep Learning Web Crawler Algorithms Based on Big Data in Information Collection and Processing

作　　者：于平 YU Ping(Guangzhou Huanan Business College,Guangzhou,Guangdong Province,510650 China)

出　　处：《科技资讯》2024年第16期55-57,共3页Science & Technology Information

基　　金：广东省教育厅2023年度广东省普通高校特色创新类项目“基于深度学习的网络爬虫算法研究与优化”(项目编号:2023KTSCX407)。

摘　　要：旨在利用大数据和深度学习技术优化网络爬虫算法,以更好地满足信息搜集与处理的需求。首先,使用大数据技术进行数据收集;其次,引入词频反转文档频率(Term Frequency-Inverse Document Frequency,TF-IDF)权重作为输入特征的初始权重,并利用传播激活算法来优化爬虫算法;最后,对多模态信息进行整合。为了测试基于大数据的深度学习网络爬虫算法在信息搜集与处理中的应用效果,将其与传统方法进行了比较。通过实验发现,在统一资源定位器(Uniform Resource Locator,URL)数量为10000时,提出的方法的覆盖率可达92.9%,而传统方法的覆盖率仅为73.7%。研究表明:所提出的基于大数据的深度学习网络爬虫算法在信息收集方面具有更高的覆盖率和更好的准确性。This article aims to optimize web crawler algorithms by using Big Data and Deep Learning technology to better meet the needs of information collection and processing.Firstly,it uses Big Data technology for data collection.Then,the Term Frequency-Inverse Document Frequency(TF-IDF)weight is introduced as the initial weight of the input feature,and the Propagation Activation algorithm is used to optimize the crawler algorithm.Finally,it integrates multimodal information.In order to test the application effect of Deep Learning web crawler algorithms based on Big Data in information collection and processing,this article compared them with traditional methods.Through experiments,it was found that the coverage of the proposed method can reach 92.9%when the number of Uniform Resource Locators(URL)is 10000,while the coverage of traditional methods is only 73.7%.Research has shown that the Deep Learning web crawler algorithm based on Big Data proposed in this article has higher coverage and better accuracy in information collection.

关键词：网络爬虫算法深度学习信息收集和处理大数据

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于大数据的深度学习网络爬虫算法在信息搜集与处理中的应用

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于大数据的深度学习网络爬虫算法在信息搜集与处理中的应用

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索