基于网络爬虫的森林经营知识采集系统研建  被引量:5

Research and construction of web crawler based forest management knowledge collection system

在线阅读下载全文

作  者:刘建成[1] 吴保国[1] 陈栋[1] 

机构地区:[1]北京林业大学信息学院,北京100083

出  处:《浙江农林大学学报》2017年第4期743-750,共8页Journal of Zhejiang A&F University

基  金:"十二五"国家高技术研究发展计划("863"计划)项目(2012AA102003)

摘  要:针对如何在互联网上准确获取森林经营知识的问题,提出研建森林经营知识采集系统来解决这一问题。在分析森林经营知识采集问题的基础上,设计系统流程、系统模块、数据库,改进网络爬虫规则并加以限定,论述爬虫工作流程和算法。该系统总结分析了森林经营主题网页的特点,通过建立森林经营特征向量对采集内容进行识别,并对森林经营知识去噪处理,智能匹配规则提取知识,使用欧氏距离识别指纹去除重复的森林经营知识。实验结果表明,该系统采集的森林经营知识具有高主题相关度、高准确率、低重复度的特点,满足服务于森林经营决策支持系统的要求。Accurate Internet access to forest management information can be obtained through the construction of a data collection system for forest management. Based on an analysis of the data collection, system process,system module and database were designed, rules governing web crawlers were improved and delimited, and workflow and algorithm of web crawlers were explored. This system summarized and analyzed the characteristics observed from webpages featuring forest management, and served to identify those collected data contents with an eigenvector of forest management. Information about forest management was also denoised by this system;information was extracted through intelligence match, and repeated information about forest management was eliminated through fingerprint recognition by Euclidean distance. The experiment results indicated that this data collection system for forest management featured high subject relevance, high accuracy, and low repetition rate.Therefore, it can satisfy the need of the forest management decision support system.

关 键 词:森林经理学 森林经营知识 知识库 知识采集 网络爬虫 

分 类 号:S750[农业科学—森林经理学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象