检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨和平 陈瑜 张志强[1] YANG Heping;CHEN Yu;ZHANG Zhiqiang(Division of Data Services, National Meteorological Information Center, Beijing 100081, China;Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China;Gembloux Agro-Bio Technology, University of Liège, Gembloux 5030, Belgium)
机构地区:[1]国家气象信息中心资料服务室,北京100081 [2]中国农业科学院植物保护研究所,北京100193 [3]列日大学生物技术学院
出 处:《计算机工程与应用》2017年第19期257-264,共8页Computer Engineering and Applications
基 金:公益性行业(气象)科研专项(重大专项)(No.GYHY(QX)20150600-7);第五届青年科技基金(No.NMICQJ201604)
摘 要:针对单个网站构建本体库垂直搜索引擎的过程中,叙词及其间逻辑关系等收集整理所耗人力成本高,导致该技术框架虽成熟,而大多网站搜索功能仍以字符匹配为主,缺乏分词、查询扩展及结果的相关度排序,很难准确命中相关查询内容等问题,设计并开发了一套基于网站简约本体库的垂直搜索系统。该系统以中国气象数据网(http://data.cma.cn)为例,利用protégé根据网站的导航目录,构建了中国气象数据网的本体库,基于Lucene引擎构建技术框架,对本体库中的对象及网页内容分别进行分词,并构建本体对象索引库及网页索引库;前端对查询内容分词后,先在本体对象索引库中进行扩展,利用TF-IDF相关度算法计算扩展结果的相关度并排序,该值作为各扩展本体对象的权值,并将各自的权值动态赋给利用Jena二次语义分析技术扩展的对象,最后将所有带有权值的关键词在网页索引库中查询检索,计算结果相关度并排序。实验结果表明,该系统构建简便,能为用户扩展、推荐相关查询内容,提高了针对网站检索的查准率及查全率。As the progress is both time and effort consuming to build a Web ontology-based vertical search engine bycollating the descriptors and the relation for each descriptor,it is not suitable for most of website search system but searchengine.And thus,the Web retrieval system remains the character-matching search function which lacks of segmentation,semantic query expansion,ranking the results by semantic relatedness and so on.To solve those problems,a verticalsearch engine based on a concise ontology has been designed and implemented.Taking the case of China MeteorologicalData Service Center(CMDC),firstly,a concise ontology library will be built by protégéwith the list of website navigation,which is used to design a vertical search engine on the frame of Lucene.Meanwhile,the segmentation algorithm(IKanalyzer)is used for this system in the progress of indexing and searching.After that,the semantics is expanded by thesemantic analysis techniques(Jena).Remarkably,the correlation degree of the semantic expansion has been calculatedused as the weight value of each segmented words.This is used to rank the search result by the TF-IDF algorithm.Theresults show that the system can be used to expand and recommend the relative search content,and there is a great promotionof both precision and recall of results within these improvements.
分 类 号:TP39[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30