检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]山东大学管理学院,济南250100
出 处:《统计与决策》2017年第14期178-181,共4页Statistics & Decision
基 金:国家社会科学基金重大项目(15ZDB157);国家社会科学基金重点项目(12AZD098);国家统计局全国统计科学研究重点项目(2013LZ23)
摘 要:小微企业由于信息严重不对称导致融资难、融资贵、贷款难等问题,基于大数据来源之一的互联网社交媒体的小微企业信息采集是获取小微企业信息数据的重要途径。文章面对爆发式增长的互联网信息资源,利用主题聚焦网络爬虫技术、数据库技术、Java技术等设计并实现由基于链接结构分析的链接地址URL筛选及采集、基于模板节点匹配的网页正文信息抽取、数据入库三个功能模块组成的小微企业统计信息自动采集系统,采集到的数据以结构化数据的形式存储到My SQL数据库中,为后续数据挖掘与分析提供良好的数据支持。结果表明,文章所提出的信息自动采集系统采集效率较高,能够适应小微企业统计信息采集的需求。Because of the serious asymmetry of information, small and micro enterprises have great difficulty in financing and obtaining loans, or raise money with much higher cost. Small and micro enterprises information collection based on Internet social media which is one of the sources of big data, is an important channel of acquiring small and micro enterprise data. Facing the ex- plosive growth of Internet information resources, this paper utilizes theme focused web crawler technology, database technology, Java technology, etc. to design and realize a small and micro businesses information automatic acquisition system, which compris- es three functional modules of chained address URL filter and collection based on link structure analysis, web page text informa- tion extraction based on template node matches, and data storage. The collected data is stored in the form of structured data to MySQL database to provide support for the follow-up data mining and analysis. The study result shows that the proposed system has a high acquisition efficiency, and is able to meet the current needs for small and micro enterprise information acquisition.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49