检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王松 刘洪基 叶晓波 WANG Song;LIU Hongji;YE Xiaobo(School of Economics&Management,Chuxiong Normal University,Chuxiong,Yunnan Province 675000;Dept.of Management of State-owned Assets and Informationalization,Chuxiong Normal University,Chuxiong,Yunnan Province 675000)
机构地区:[1]楚雄师范学院经济与管理学院,云南楚雄675000 [2]楚雄师范学院国有资产与信息化管理处,云南楚雄675000
出 处:《楚雄师范学院学报》2020年第6期124-131,共8页Journal of Chuxiong Normal University
摘 要:通用搜索引擎存在不能有针对性地满足用户查询需求和搜索关键词难以准确描述的问题。从数据挖掘和机器学习的角度出发,提出一种基于网络爬虫开源框架Heritrix的可配置主题的聚焦爬虫方法,从指定的站源,根据不同的爬取策略,启动多线程爬取,按照预先设置的关键字和栏目信息进行分类搜索,把最符合条件和要求的信息爬取下来供选择、判断、分析和利用。这种方法可在一定程度上解决搜索引擎查询信息的需求问题,提升用户体验,提高检索效率。During the time of development of the Internet,massive information was generated in the cyber-world and has become an important asset.Meanwhile users’requirement on information search has become higher and higher.How to search key information quickly and effectively is one of the most difficult problems to solve.Basically,the search engine satisfies needs in data searching.However,needs of users only focusing on special themes or fields cannot be satisfied.Through searching key words only is hard to describe their needs or their problems.Thus,this study focuses on data mining and machine learning and proposes a crawler method of configurable theme focused on crawler system that is based on open-source framework of web crawler Heritrix.To a certain extent this method can solve the above mentioned problems and improve users’perception and searching efficiency.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222