融合兴趣主题矩阵和主题生命树的社交用户长短兴趣挖掘  

Long and Short Interest Mining of Social User by Integrating Interest Topic Matrix and Topic Tree of Life

在线阅读下载全文

作  者:吴树芳[1] 高梦蛟 朱杰[2] Wu Shufang

机构地区:[1]河北大学管理学院,河北保定071000 [2]河北大学数学与信息科学学院,河北保定071000

出  处:《情报理论与实践》2024年第2期161-169,共9页Information Studies:Theory & Application

基  金:河北省人文社会科学研究重大课题攻关项目“基于大数据的河北省网络治理机制研究”的成果,项目编号:ZD202102。

摘  要:[目的/意义]针对当前社交用户兴趣挖掘效果不理想,且缺乏对兴趣类型特征的深入研究,提出一种新的长短兴趣挖掘方法。[方法/过程]首先引入兴趣价值参数作为先验知识对Labeled LDA主题模型进行改进,依据改进的主题模型挖掘不同时间窗口的兴趣主题,构建兴趣主题矩阵。然后基于用户兴趣的变化规律构建主题生命树,挖掘用户兴趣的生命特征和潜在关联,将用户兴趣划分为长期兴趣、短期兴趣和过期兴趣。最后依据兴趣主题的强度和波动幅度量化用户不同类型兴趣的权重,实现对用户兴趣的准确表示。[结果/结论]实验采用从新浪微博爬取的真实数据作为训练集和测试集,与已有的兴趣挖掘方法进行比较,结果发现长短兴趣挖掘方法在F1值和MRR值上最高分别提升了7.68%和7.41%。[局限]仅利用微博文本信息对方法进行验证,缺乏对跨平台信息的深入探讨。[Purpose/significance]In view of the poor accuracy in mining long-short term interest and the lack of in-depth researches on interest types currently,a new method was proposed.[Method/process]Firstly,the Labeled LDA model was improved by introducing interest value parameters,and the interest topic matrix is constructed based on the interest topics in different time windows that are mined according to the improved topic model.Secondly,the life characteristics and potential associations of user interest are mined through the topic tree of life that constructed based on the changing rules of user interest,the user interest was divided into long-term interest,short-term interest and expired interest.Finally,the weights of different types interest are quantified by the intensity and fluctuation amplitude of interest topics to achieve accurate representation of user interest.[Result/conclusion]Experiments employ the real data crawled from Sina Weibo as the training set and test set,and the results indicate that the F1 and MRR of the new method has the highest improvement of 7.68%and 7.41%respectively when compared to the classic interest models.[limitations]The limitation is only verified by using the Microblog text data,and the application effect of various types of interest in practical scenarios is not discussed deeply.

关 键 词:兴趣挖掘 长短兴趣 主题模型 兴趣主题矩阵 主题生命树 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论] TP391.1[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象