检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]东北农业大学,哈尔滨150030
出 处:《农机化研究》2014年第3期182-185,共4页Journal of Agricultural Mechanization Research
基 金:国家自然科学基金项目(31101080)
摘 要:农业专业搜索引擎对特定主题的农业信息进行检索,其信息量多、精确度低。针对此现状,以开源搜索引擎Nutch为技术框架,对大豆主题网页资源采集系统进行了研究与设计。以大豆信息为主题,研究了主题相关度判别技术,借鉴BM25F模型的分域思想、基于向量空间模型,提出了大豆主题相关度判别算法。在Nutch中引入IKAnalyzer中文分词工具包,实现了大豆主题相关度的判别。实验结果表明,该算法能够显著地提高大豆主题网页资源采集的准确率。Presently the amount of information is large and the accuracy is low when agricultural professional search en gine is retrieving the specific subject agricultural information. In view of the situation, this article makes research and de sign on the soybean subject webpage resources acquisition system based on the technological framework of Nutch, which is open-source search engine. Taking soybean information as the subject, it researches the technology of subject correla tion judgment, references to the thought of BM25F model and puts forward the algorithm discriminating soybean subject correlation based on vector space model. It introduces the Chinese word segmentation' s toolkit of IKAnalyzer Based on the Nutch, and finally realizes the judgment of soybean subject correlation. The experimental results show that the algo rithm can significantly improve the accuracy of acquainting soybean subject webpage resources.
分 类 号:S126[农业科学—农业基础科学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.124