基于不同数据集的协作过滤算法评测  被引量:22

Collaborative filtering algorithm evaluation for various datasets

在线阅读下载全文

作  者:董丽[1,2] 邢春晓[3] 王克宏[1] 

机构地区:[1]清华大学计算机科学与技术系,北京100084 [2]清华大学图书馆,北京100084 [3]清华大学信息技术研究院,北京100084

出  处:《清华大学学报(自然科学版)》2009年第4期590-594,共5页Journal of Tsinghua University(Science and Technology)

基  金:国家自然科学基金资助项目(60473078);国家"八六三"高技术项目(2006AA010101);国家"十一五"科技支撑计划资助项目(2006BAH02A12)

摘  要:针对协作过滤算法评测中普遍采用单一数据集,该文将传统的User-based(近邻数为20)、Item-based、Itemaverage、Item user average和Slope One 5种算法应用于MovieLens和Book-Crossing两种数据分布特征不同的数据集。结果显示,在Movielens这种评分值相对比较稠密的数据集上,Slope One算法的预测精度最好;而在评分值相对比较稀疏的Book-Crossing数据集上,Item-based算法的预测精度最好,Slope One的预测精度最差。选择算法应根据用户和资源分布具体情况确定。Most collaborative filtering (CF) research has focused on doing experiments on single dataset or datasets with the same characteristics. This paper presents an analysis of several typical CF algorithms, the User-based KNN method (with 20 neighborhoods), the item-based method, the item average method, the item user average method, and the Slope One method. These algorithms are evaluated on two types of datasets, Movielens and Book-Crossing, which have different user-item distribution characteristics. The results show for the relatively dense ratings on the Movielens dataset, the Slope One method has the best prediction precision, while on datasets with relatively sparse ratings such Book-Crossing, the item-based method is the best, while the Slope One method is the worst. Thus, the different CF algorithms give different results on the different datasets, so the CF algorithm should be designed according to the user item distribution characters.

关 键 词:协作过滤 个性化推荐 算法评测 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术] TP311.13[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象