基于混合过滤的地学数据个性化推荐方法设计与实现被引量：6

A hybrid personalized data recommendation approach for geoscience data sharing

作　　者：王末郑晓欢[3] 王卷乐[4,5,6] 柏永青 WANG Mo;ZHENG Xiaohuan;WANG Juanle;BAI Yongqing(Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing 100081, China;Key Laboratory of Agricultural Big Data, Ministry of Agriculture, Beijing 100081, China;Office of General Affairs, Chinese Academy of Sciences, Beijing 100864, China;State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, CAS, Beijing 100101, China;University of Chinese Academy of Sciences, Beijing 100049, China;Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China)

机构地区：[1]中国农业科学院农业信息研究所,北京100081 [2]农业部农业大数据重点实验室,北京100081 [3]中国科学院办公厅,北京100864 [4]中国科学院地理科学与资源研究所,资源与环境信息系统国家重点实验室,北京100101 [5]中国科学院大学,北京100049 [6]江苏省地理信息资源开发与利用协同创新中心,南京210023

出　　处：《地理研究》2018年第4期814-824,共11页Geographical Research

基　　金：国家科技基础条件平台建设项目(2005DKA32300);中国科学院特色研究所培育建设服务项目(TSYJS03);中国工程科技知识中心建设项目(CKCEST-2017-3-1);农业科学数据挖掘分析平台研究与建设项目(JBYW-AII-2017-32);中国农业科学院科技创新工程项目(CAAS-ASTIP-2016-AII)

摘　　要：推荐系统是帮助互联网用户克服信息过剩的有效工具。在地学数据共享领域,较其他物品的内容属性,地学数据具有更加丰富的时空属性,这也给地学数据推荐带来挑战。针对地学数据的特点,为地学数据共享推荐服务开发了一种动态加权的混合过滤方法。该方法分别采用协同过滤和基于内容过滤算法预测用户对数据的兴趣度,再以训练模型计算最优加权权重,计算最终预测评分。在数据获取阶段,通过用户访问日志数据,采用Jenks Natural Break算法分析用户访问记录获取用户的数据兴趣度。在基于内容过滤部分,通过数据的空间、时间及内容属性计算数据相似度,并以用户历史行为为依据计算用户兴趣。在协同过滤和基于内容过滤中分别采用k-NN算法计算用户对未访问数据的预测评分,并进行加权求和。通过训练集,对理想权重值及用户的共同评价度(co-rating level)进行建模,拟合二者的关系。该模型被应用于混合过滤的权重调整,以获得最优的加权方程。测试结果显示,结合数据时空属性的混合过滤方法的准确度和召回率,较单一的协同过滤或基于内容过滤方法有显著提高。Recommender systems are effective tools helping Internet users mitigate informa- tion overloading. In geoscience data sharing domain, items （datasets） are more informative in terms of spatial and temporal attributes compared to regular item （e.g. books, movies, music）. Thus, high-performance recommendation algorithms for geoscience data are more challenging. This study proposed an approach that combines content-based filtering with item-based collab- orative filtering using dynamic weights. The approach examines merits of both collaborative fil- tering in its predictive ability and item content information to mitigating data sparsity and early ratter problem. Users＇ ratings on items were first derived with their historical visiting time by Jenks Natural Breaks. In the CBF part, spatial, temporal, and thematic information of geosci- ence datasets were extracted to compute item similarity. Predicted ratings were computed with k-NN method separately using CBF and CF, and then combined with dynamic weights. With training dataset, we attempted to find the best model describing ideal weights and users＇ co-rat- ing level. A logarithmic function was identified to be the best model. The model was then ap- plied to tune the weights of CF and CBF on user-item basis with test dataset. Evaluation results showed that the dynamic weighted approach outperformed either solo CF or CBF approach in terms of Precision and Recall.

关键词：地理空间数据推荐系统混合过滤科学数据共享

分类号：TP391.3[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于混合过滤的地学数据个性化推荐方法设计与实现被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于混合过滤的地学数据个性化推荐方法设计与实现 被引量：6

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于混合过滤的地学数据个性化推荐方法设计与实现被引量：6