检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王晓宇 王培 WANG Xiaoyu;WANG Pei(School of Computer&Software Engineering,SIAS University,Xinzheng Henan 451150,China)
机构地区:[1]郑州西亚斯学院计算机与软件工程学院,河南新郑451150
出 处:《信息与电脑》2023年第23期69-71,共3页Information & Computer
基 金:河南省2021年民办普通高等学校学科专业建设资助项目(项目编号:教办政法[2020]179号,软件工程)。
摘 要:常规的网页信息关键词爬取方法通过提取网页信息的统一资源定位器(Uniform Resource Locator,URL)来获得网页信息,提取关键词局限于文本字段,导致爬取准确率较低。对此,提出基于混沌序列的网页信息关键词爬取方法。首先,分析信息爬取流程,提取更加详细的全部信息;其次,根据提取原理的不同,划分网页信息提取板块;最后,分析网页信息混沌序列,提取所需网页信息关键词。实验结果表明,采用所提方法时,爬取准确率约为96.8%,相比传统方法提高了6.92%,相对来说,具有较高的准确性。In conventional web page information keyword methods, web page information is obtained by extracting the Uniform Resource Locator(URL) of the web page information. The extraction of keywords is limited to text fields, resulting in insufficient crawling accuracy. Therefore, a method for crawling web information keywords based on chaotic sequences is proposed. In the research of keyword crawling methods, firstly, analyze the information crawling process and extract more detailed and complete information. Secondly, according to the different extraction principles, divide the webpage information extraction section. Finally, analyze the chaotic sequence of web page information and extract the required web page information keywords. From the experimental results, it can be seen that the crawling accuracy of the proposed method is about 96.8%, which is 6.92% higher than traditional methods. Relatively speaking, the designed crawling method has high accuracy.
分 类 号:G642[文化科学—高等教育学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7