基于逆向技术的深层网络爬虫与数据分析  被引量:2

Deep Web Crawlers and Data Analysis Based on Reverse Technology

在线阅读下载全文

作  者:邢羽琪 杨柽[1] XING Yuqi;YANG Cheng(School of Mathematics and Computer Science,Yunnan Minzu University,Kunming 650500,China)

机构地区:[1]云南民族大学数学与计算机科学学院,云南昆明650500

出  处:《软件工程》2023年第12期41-45,共5页Software Engineering

摘  要:大数据时代,各行各业对数据采集的需求日益增多,其中使用JavaScript加密技术进行数据采集的需求广泛,但也存在不少瓶颈。文章采用JavaScript逆向爬虫技术还原参数加密过程,动态构造出某购物网站商品评价的统一资源定位系统(Uniform Resource Locator,URL),实现了指定分类下多商品评价数据的动态采集,为同类加密数据的采集提供了新的思路。使用SnowNLP[基于Python的中文自然语言处理(NLP)库]对采集到的乐高评论数据进行情感分析发现,约66%的购买者对商品给出了积极评论;情感分布呈极性,高段集中在0.8~1.0,低段集中在0.0~0.2;词云分析显示出购买者群体比较注重商品的快递包装外观。以上结论可为在线商家提升经营管理水平提供参考。In the era of big data,there is an increasing demand for data acquisition from various industries,among which the use of JavaScript encryption technology for data acquisition is widespread,but there are also many bottlenecks.The paper proposes to use JavaScript reverse crawler technology to restore the parameter encryption process and dynamically construct a Uniform Resource Locator(URL)for product evaluation on a shopping website.It realizes the dynamic acquisition of multiple product evaluation data under specified classifications,providing a new approach for the acquisition of similar encrypted data.SnowNLP[Python-based Chinese Natural Language Processing(NLP)library]is used to conduct sentiment analysis on the collected LEGO comment data,and it is found that about 66%of buyers gave positive comments on the product.The distribution of sentiment shows polarity,with high levels concentrated between 0.8 and 1.0,and low levels concentrated between 0.0 and 0.2.Word cloud analysis shows that the buyer group pays more attention to the appearance of the product's express packaging.The above conclusions can provide reference for online sellers to improve their business management.

关 键 词:深层网络爬虫 JavaScript加密 逆向技术 AJAX 数据挖掘 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象