基于网络爬虫结合关联大数据的用户信息提取  被引量:5

User Information Extraction Based on Web Crawler and Associated Big Data

在线阅读下载全文

作  者:刘多林[1] 吕苗 LIU Duo-lin;LV Miao(Shenyang Ligong University,ShenyangLiaoning 110159,China)

机构地区:[1]沈阳理工大学,辽宁沈阳110159

出  处:《计算机仿真》2021年第8期482-486,共5页Computer Simulation

基  金:2018年辽宁省社会科学规划基金青年项目(L18CGL004)。

摘  要:针对传统方法筛选用户访问记录过程中,抓取页面行为特征不够全面,影响了用户信息采集成功率等问题,提出基于网络爬虫结合关联大数据的用户信息提取方法。利用网页爬虫技术,协助浏览器抓取网络页面,统计访问模式和网页浏览内容,获取历史行为数据,挖掘用户感兴趣的关联大数据,对行为特征进行预测评分,排序其重要程度,得到用户信息提取列表,进一步筛选列表页面信息,得到能够反映用户兴趣的资源信息。选取时间为30天的手机网络流量数据集进行对比实验,结果表明,上述方法相比传统方法提高了信息采集成功率,提取用户信息更加完整,同时提高了提取信息准确率,提取结果与用户关联程度更高。During the process of filtering user access records,traditional methods have low success rate of user collection.Therefore,a method of user information extraction based on web crawler and associated big data was reported in this paper.Web crawler technology was used to assist the browser to grab web pages.Access patterns and web browsing content were counted to obtain historical behavior data.Associated big data that users were interested in is mined.After the behavior characteristics were predicted and scored,their importance was counted.The extraction list of user information was obtained.Then the page information of the list was filtered to get the resource information that can reflect the user’s interest.A 30-day mobile network traffic data set was selected for comparative experiments.The results show that the method of information collection success rate,user information integrity,the accuracy of information extraction and correlation degree are better than the traditional method.

关 键 词:网络爬虫 关联大数据 用户信息提取 网页页面 行为特征 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象