数字图书馆主题搜索引擎的设计与实现  被引量:1

Design and implementation of search engine system for digital library

在线阅读下载全文

作  者:林其东[1] 陈传波[2] 郑乐丹[1] 张一曼[3] 

机构地区:[1]温州大学图书馆,浙江温州325035 [2]华中科技大学软件学院,武汉430074 [3]温州大学瓯江学院,浙江温州325035

出  处:《计算机应用研究》2009年第8期2952-2955,共4页Application Research of Computers

基  金:温州大学校级科研基金资助项目(2007L029)

摘  要:提出构建数字图书馆主题搜索引擎的总体系统设计。利用一个预处理系统尽量选择高质量的种子站点,从而产生W eb主题定义数据;在系统控制器的协调下,各主题爬行器同步地采集爬行器所推荐的W eb资源,对下载的资源进行文本分类与主题识别;将已经下载的W eb资源按学科分类存储在W eb主题资源库中,通过全局信息库建立索引,接入通用接口进行依主题检索。依赖数字图书馆各方面特点,提出支持多线程主题爬行器的设计,并提出一种新颖的URL主题相关性剪切算法EPR,为实现数字图书馆主题搜索引擎原型提供重要的设计。基于开源Lucene平台进行系统扩展而形成最终系统,实验结果表明该工作是相当有效的,尤其是提出的相关性判别算法EPR,具有相当的创新性和实际应用价值。This paper advanced the total system design for topic-specific search engine of digital library. It made use of a pretreatment system to select the seed station with high quality, thus giving Web topic defined data. Every topic crawler collected synchronistically Web resource recommended by crawlers with regulation of system controller, then classified text and identified topic in download resource, which was stored into Web topic resource database according to discipline classification. Others could search the topic resource through the index of whole information database. According to every specially characterist of digital library, this paper brang up the design for topic-specific crawler of multi-thread, and gave anovel URL pruning algorithm-EPR,for the design to realize topic-specific search engine prototype of digital library. Lucene-based open-source platform for the expansion of the system and the formation of the final system, the experiment results show that the research work of this article is effective, especially in EPR algorithm, which are really creative and valuable in real application environment.

关 键 词:数字图书馆 主题 爬行器 搜索引擎 EPR算法 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] TP393[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象