检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]华南理工大学建筑学院,广东广州510000 [2]华南理工大学建筑设计研究院有限公司,广东广州510000
出 处:《科技创新与应用》2020年第33期1-5,10,共6页Technology Innovation and Application
摘 要:在工作和研究中持续更新大量的资料和数据是建筑师的职业基础。传统人工搜索互联网的方式工作量大且挖掘率低,对网站数据源的利用往往不够充分。国内大部分建筑类网站采用HTML文本标记数据,对HTML采用网络聚焦爬虫有助于建筑师高效定位并规范化储存专业数据。通过对建筑类主流网站结构特征进行分析,总结建筑学3种专业爬虫需求。基于Python的语言特征,提出公开数据类和建筑档案类2种爬虫策略。实测结果表明爬虫策略具有数据采集实时性好、易管理维护的优点,同时均运行高效且稳定,可为建筑专业大数据分析提供更多高质量的数据源。It is a professional basis for architects to keep updating a large number of data and statistics in work and research.Due to the heavy workload and low work rate of the traditional artificial Internet search mode,the utilization of website resource is often insufficient.Most of Chinese architectural websites use Hyper Text Markup Language.Through focus crawler of HTML,it is efficient for architects to locate and store data in a standard way.Based on the analysis of common websites of architecture,three kinds of professional crawler requirements of architects are summarized.Based on the features of Python,two crawler strategies,namely numeral data strategy and building archives strategy,are proposed.Results show that these strategies are highly effective,stable,and have advantages of good real-time data collection,easy management and maintenance,which can provide more high-quality data sources for architectural big data analysis.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.198