检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:于平 YU Ping(Guangzhou Huanan Business College,Guangzhou,Guangdong Province,510650 China)
出 处:《科技资讯》2024年第16期55-57,共3页Science & Technology Information
基 金:广东省教育厅2023年度广东省普通高校特色创新类项目“基于深度学习的网络爬虫算法研究与优化”(项目编号:2023KTSCX407)。
摘 要:旨在利用大数据和深度学习技术优化网络爬虫算法,以更好地满足信息搜集与处理的需求。首先,使用大数据技术进行数据收集;其次,引入词频反转文档频率(Term Frequency-Inverse Document Frequency,TF-IDF)权重作为输入特征的初始权重,并利用传播激活算法来优化爬虫算法;最后,对多模态信息进行整合。为了测试基于大数据的深度学习网络爬虫算法在信息搜集与处理中的应用效果,将其与传统方法进行了比较。通过实验发现,在统一资源定位器(Uniform Resource Locator,URL)数量为10000时,提出的方法的覆盖率可达92.9%,而传统方法的覆盖率仅为73.7%。研究表明:所提出的基于大数据的深度学习网络爬虫算法在信息收集方面具有更高的覆盖率和更好的准确性。This article aims to optimize web crawler algorithms by using Big Data and Deep Learning technology to better meet the needs of information collection and processing.Firstly,it uses Big Data technology for data collection.Then,the Term Frequency-Inverse Document Frequency(TF-IDF)weight is introduced as the initial weight of the input feature,and the Propagation Activation algorithm is used to optimize the crawler algorithm.Finally,it integrates multimodal information.In order to test the application effect of Deep Learning web crawler algorithms based on Big Data in information collection and processing,this article compared them with traditional methods.Through experiments,it was found that the coverage of the proposed method can reach 92.9%when the number of Uniform Resource Locators(URL)is 10000,while the coverage of traditional methods is only 73.7%.Research has shown that the Deep Learning web crawler algorithm based on Big Data proposed in this article has higher coverage and better accuracy in information collection.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49