结合正负反馈状态表示的深度强化学习推荐方法

Deep Reinforcement Learning-Based Recommendation Method with Positive and Negative Feedback State Representation

作　　者：张涛张志军[1] 曹家伟范钰敏刘佳慧袁卫华[1] ZHANG Tao;ZHANG Zhijun;CAO Jiawei;FAN Yumin;LIU Jiahui;YUAN Weihua(School of Computer Science and Technology,Shandong Jianzhu University,Ji′nan 250100,China)

机构地区：[1]山东建筑大学计算机科学与技术学院,山东济南250100

出　　处：《软件导刊》2024年第12期27-35,共9页Software Guide

基　　金：山东省自然科学基金项目(ZR2021MF099,ZR2022MF334);山东省教学改革研究项目(M2021130,M2022245,Z2022202);山东省优质专业学位教学案例库建设项目(SDYAL2022155);山东省重点研发计划(软科学项目)(2021RKY03056);“海右计划”产业领军人才本土类创新团队项目(2023)。

摘　　要：深度强化学习技术在交互式推荐系统上的应用已十分成熟,但少有研究专门对状态进行表示建模,只针对用户交互过程中的正反馈序列进行状态表示建模,导致推荐系统忽略了用户交互过程中负反馈序列中存在的潜在关系及用户兴趣变化,使得推荐结果过于片面。鉴于此,提出一种基于对比学习和深度强化学习的推荐系统框架,设计了对用户和推荐系统交互过程中产生的正负反馈序列进行建模的状态表示模块。此外,为了缓解正反馈数据稀疏问题和细粒化正负反馈之间的差异性,还加入了对比辅助任务。在Movielens-100K和Movielens-1M两个真实世界的数据集上进行了大量实验,HR@10评价指标分别为0.705 2、0.490 2;NDCG@10评价指标分别为0.478 2、0.271 5。结果表明,该方法明显优于当前先进方法,证明了CRLRS对正负反馈同时进行建模以及加入对比辅助任务的必要性,并且具有更好的推荐性能。The application of deep reinforcement learning techniques in interactive recommendation systems has reached a high level of maturity.However,there is currently limited research dedicated to modeling there presentation of states.Existing works primarily focus on modeling state representations based on positive feedback sequences during user interactions.This approach results in the oversight of potential relationships existing within negative feedback sequences generated by users during interactions,as well as changes in user interests.Consequently,the recommendations produced by such systems tend to be one-sided.To address this gap,a novel recommendation system framework,named Contrastive Learning and Deep Reinforcement Learning-Based Recommender System(CRLRS),is proposed.CRLRS is designed to model state representations for both positive and negative feedback sequences generated during user interactions.Additionally,in order to mitigate data sparsity issues associated with positive feedback and address differences between fine-grained positive and negative feedback,a contrastive auxiliary task is incorporated.Extensive experiments were conducted on two real-world datasets,among which HR@10 The results of the evaluation indicators on the Movielens-100k and Movielens-1m datasets are 0.7052 and 0.4902,respectively;NDCG@10 The results of the evaluation indicators are 0.4782 and 0.2715.The comparison results show that our method is significantly better than the current state-ofthe-art methods,which proves the necessity of CRLRS modeling positive and negative feedback simultaneously and adding comparative auxiliary tasks,and has better recommendation performance.

关键词：深度强化学习对比学习推荐系统正负反馈状态表示

分类号：TP18[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合正负反馈状态表示的深度强化学习推荐方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

结合正负反馈状态表示的深度强化学习推荐方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索