检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘多林[1] 吕苗 LIU Duo-lin;LV Miao(Shenyang Ligong University,ShenyangLiaoning 110159,China)
机构地区:[1]沈阳理工大学,辽宁沈阳110159
出 处:《计算机仿真》2021年第8期482-486,共5页Computer Simulation
基 金:2018年辽宁省社会科学规划基金青年项目(L18CGL004)。
摘 要:针对传统方法筛选用户访问记录过程中,抓取页面行为特征不够全面,影响了用户信息采集成功率等问题,提出基于网络爬虫结合关联大数据的用户信息提取方法。利用网页爬虫技术,协助浏览器抓取网络页面,统计访问模式和网页浏览内容,获取历史行为数据,挖掘用户感兴趣的关联大数据,对行为特征进行预测评分,排序其重要程度,得到用户信息提取列表,进一步筛选列表页面信息,得到能够反映用户兴趣的资源信息。选取时间为30天的手机网络流量数据集进行对比实验,结果表明,上述方法相比传统方法提高了信息采集成功率,提取用户信息更加完整,同时提高了提取信息准确率,提取结果与用户关联程度更高。During the process of filtering user access records,traditional methods have low success rate of user collection.Therefore,a method of user information extraction based on web crawler and associated big data was reported in this paper.Web crawler technology was used to assist the browser to grab web pages.Access patterns and web browsing content were counted to obtain historical behavior data.Associated big data that users were interested in is mined.After the behavior characteristics were predicted and scored,their importance was counted.The extraction list of user information was obtained.Then the page information of the list was filtered to get the resource information that can reflect the user’s interest.A 30-day mobile network traffic data set was selected for comparative experiments.The results show that the method of information collection success rate,user information integrity,the accuracy of information extraction and correlation degree are better than the traditional method.
关 键 词:网络爬虫 关联大数据 用户信息提取 网页页面 行为特征
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7