融合数据挖掘和评分预测的推荐算法  被引量:1

A recommendation algorithm integrating data mining andrating prediction

在线阅读下载全文

作  者:林啸轩 季一木[1] 刘尚东[1] 李玲娟[1] LIN Xiaoxuan;JI Yimu;LIU Shangdong;LI Lingjuan(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)

机构地区:[1]南京邮电大学计算机学院,江苏南京210023

出  处:《南京邮电大学学报(自然科学版)》2024年第1期101-108,共8页Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition

基  金:国家重点研发计划(2020YFB2104002);江苏省重点研发计划(BE2019740)资助项目。

摘  要:针对传统UserCF算法存在的数据稀疏、相似度计算开销大且不够准确的问题,以提高推荐准确率、覆盖率和时间效率为目标,设计了融合数据挖掘和评分预测的推荐算法DRR。首先用PCA降维算法解决用户评分矩阵过大且稀疏的问题;再用Canopy算法对降维后的矩阵进行处理得到聚类个数K,以余弦相似度为距离度量,用K-means算法对用户聚类,并用Apriori算法挖掘簇内项目之间潜在的关联规则,计算项目关联因子;最后以目标用户所在簇内的其他用户为其近邻,基于历史评分、余弦相似度和项目关联因子预测目标用户对项目的评分,在降低寻找最近邻时耗的同时挖掘出长尾项目。在movieLens数据集、豆瓣电影数据集上与UserCF算法、基于K-means聚类的协同过滤算法和基于谱聚类的协同过滤算法的对比实验结果表明,DRR算法的准确率、召回率、F1值、覆盖率,以及时间效率都有所提升。Aiming at the traditional UserCF algorithmsproblems of inaccuracy,data sparsity,and high cost of similarity calculation,a recommendation algorithmDRR integrating data mining and rating prediction is designed to improve the recommendation accuracy,coverage and time efficiency.First,the PCA dimension reduction algorithm is used to solve the problem of the extra large and sparse user rating matrix.Second,the Canopy algorithm is used to process the reduced dimension matrix to obtain the number of clusters K.Then the K-means algorithm is deployed to cluster users with cosine similarity as the distance measurement,and the Apriori algorithm is adopted to mine the potential association rules between items in the cluster.Thus,the item association factor is calculated.Finally,other users in the target users cluster are taken as neighbors,and the rating of the target user on the item is predicted according to the historical rating,cosine similarity and item correlation factor to mine the long tail items while reducing the time consumption of searching for the nearest neighbor.The experimental results on the movieLens dataset and the Douban movie dataset show that the accuracy,recall,F1 value,coverage and time efficiency of the DRR algorithm have been improved,compared with those of the UserCF algorithm and the K-means clustering based collaborative filtering algorithm,and the spectral clustering based collaborative filtering algorithm.

关 键 词:降维 聚类 关联规则 长尾项目 评分预测 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象