一种基于科技查新的跨库检索去重算法被引量：2

A Duplicate Removal Algorithm of Cross-database Search Based on Sci-tech Novelty Retrieval

作　　者：郝慧[1]

出　　处：《现代图书情报技术》2015年第1期89-95,共7页New Technology of Library and Information Service

摘　　要：【目的】通过对科技查新中的跨库检索结果进行去重,提高查新检索效率。【方法】选取不同数据库检索记录中唯一性的特征四元组{论文名称,期刊名,发表时间,第一作者}信息,用改进的I-Match中的对比算法构建检索记录特征字串作为去重的计算依据。【结果】跨库检索去重算法对数据库检索结果进行初步分析和去重,提高查新检索效率。通过测试,算法去重准确率较高,而召回率受数据库收录信息完善度的影响,还有提高的空间。【局限】算法处理效果依赖于从数据库检索记录中提取特征四元组,由于不同数据库的检索返回结果存在差异,需要针对不同论文数据库定制检索记录特征抽取模板。【结论】通过实验测试,算法具有较高的去重准确率和处理效率,符合预定科技查新需求。[Objective] Remove the data redundancy of cross-database searching in sci-tech novelty retrieval and improve the retrieval efficiency. [Methods] Choose thesis names, serial titles, publication dates and first authors of search records from different databases and build the character strings of search records by modifying comparison algorithm related to I-Match as the evidence of duplicate removal. [Results] The duplicate removal algorithm can improve retrieval effeciency by analyzing and duplicating the retrieval results from different databases. The experient suggests the precision of algorithm is superior, while the recall of the algorithm could be improved by modifying database records. [Limitations] The treatment effect depends on four characters extracted from database search records, different feature extraction model of search records needed to be customized according to different thesis databases due to the search result diffenrence. [Conclusions] The experiment test suggests the algorithm has a decent precision of duplicate removal and treatment efficency, which accords with the requirement of sci-tech retreival.

关键词：跨库检索科技查新去重算法 I-Match

分类号：G252.62[文化科学—图书馆学] G252.7

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于科技查新的跨库检索去重算法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种基于科技查新的跨库检索去重算法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种基于科技查新的跨库检索去重算法被引量：2