基于混沌序列的网页信息关键词爬取方法优化

Optimization of Web Information Keyword Crawling Method Based on Chaotic Sequence

作　　者：王晓宇王培 WANG Xiaoyu;WANG Pei(School of Computer&Software Engineering,SIAS University,Xinzheng Henan 451150,China)

机构地区：[1]郑州西亚斯学院计算机与软件工程学院,河南新郑451150

出　　处：《信息与电脑》2023年第23期69-71,共3页Information & Computer

基　　金：河南省2021年民办普通高等学校学科专业建设资助项目(项目编号:教办政法[2020]179号,软件工程)。

摘　　要：常规的网页信息关键词爬取方法通过提取网页信息的统一资源定位器(Uniform Resource Locator,URL)来获得网页信息,提取关键词局限于文本字段,导致爬取准确率较低。对此,提出基于混沌序列的网页信息关键词爬取方法。首先,分析信息爬取流程,提取更加详细的全部信息;其次,根据提取原理的不同,划分网页信息提取板块;最后,分析网页信息混沌序列,提取所需网页信息关键词。实验结果表明,采用所提方法时,爬取准确率约为96.8%,相比传统方法提高了6.92%,相对来说,具有较高的准确性。In conventional web page information keyword methods, web page information is obtained by extracting the Uniform Resource Locator(URL) of the web page information. The extraction of keywords is limited to text fields, resulting in insufficient crawling accuracy. Therefore, a method for crawling web information keywords based on chaotic sequences is proposed. In the research of keyword crawling methods, firstly, analyze the information crawling process and extract more detailed and complete information. Secondly, according to the different extraction principles, divide the webpage information extraction section. Finally, analyze the chaotic sequence of web page information and extract the required web page information keywords. From the experimental results, it can be seen that the crawling accuracy of the proposed method is about 96.8%, which is 6.92% higher than traditional methods. Relatively speaking, the designed crawling method has high accuracy.

关键词：PYTHON 网页信息信息爬取关键词提炼

分类号：G642[文化科学—高等教育学]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于混沌序列的网页信息关键词爬取方法优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于混沌序列的网页信息关键词爬取方法优化

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索