基于词频统计的个性化信息过滤技术被引量：12

Information filtering technique based on term-frequency

机构地区：[1]哈尔滨工程大学计算机科学与技术学院,黑龙江哈尔滨150001

出　　处：《哈尔滨工程大学学报》2003年第1期63-67,共5页Journal of Harbin Engineering University

基　　金：黑龙江省青年基金资助项目(Q00C037).

摘　　要：对Internet信息进行过滤,筛选出与用户兴趣最相符的文档,是智能搜索引擎要解决的一个重要问题.本文在介绍搜索引擎基本原理的基础上,提出了一种文档学习和用户个性词典构建的实现方法,其中包括内码转换、分词、摘词处理、用户个性词典的构建及词条权值调整等环节.然后提出了一种基于词频统计的个性化文档过滤算法,该算法对传统的向量空间模型法做了改进,使之能够更好地计算文档与用户个性词典之间的相关度,根据用户的兴趣爱好对文档进行相关度的过滤、排序,并给出了实验数据.实验结果表明该方法较好地解决了智能搜索引擎中Internet信息过滤、排序的问题.It's important to filter Internet information and choose some documents most suitable for users' interests for intelligent search engines. After introduction of the basic principles of search engines a method was proposed for document learning and construction of users' personal dictionary,and this method includes code transformation, word segmentation, word choosing, construction of the users' personal dictionary, adjustment of weight of words, etc. A filtering algorithm based on term frequency was then proposed for Internet document. The algorithm improved Vector Space Model (VSM) to makes it more effective for calculation of the relevancy between documents and the users' personal dictionary. According to the relevancy established through calculation the documents were filtered and ranked. Test results show that the proposed algorithm can be used to solve the problems of filtering Internet information and ranking documents for intelligent search engines more effectively.

关键词：搜索引擎文档过滤向量空间模型法词频统计个性词典

分类号：TP391.3[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于词频统计的个性化信息过滤技术被引量：12

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于词频统计的个性化信息过滤技术 被引量：12

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于词频统计的个性化信息过滤技术被引量：12