基于改进主题模型方法的三级短视频用户画像的研究  

Study on Three-level Short Video User Portrait Based on Improved Topic Model Method

在线阅读下载全文

作  者:黄玉民 赵婵婵[1] HUANG Yumin;ZHAO Chanchan(College of Information Engineering,Inner Mongolia University of Technology,Huhhot 010051,China)

机构地区:[1]内蒙古工业大学信息工程学院,呼和浩特010051

出  处:《计算机科学》2024年第S01期686-692,共7页Computer Science

基  金:内蒙古自治区直属高校基本科研业务费项目(ZTY2023022,JY20230082);内蒙古自治区硕士研究生科研创新项目(S20231129Z);内蒙古自治区自然科学基金项目(2023LHMS06016)。

摘  要:针对如何从海量短视频数据、用户数据、交互数据中快速抽象出精准的用户兴趣的问题,提出了基于主题模型的三级标签用户画像构建方法。基于主题构建方法,将融合的LDA和GSDMM主题模型所获取的视频主题词作为用户兴趣表达向量。首先,搭建了LDA过滤器,通过比对阈值剔除与主题无关的文本信息,缩小文本规模,降低非主要语料对于兴趣表达向量生成的影响。然后,提出结合语义信息和语境信息的特征词权重矩阵的构建方法,使用Bi-GRU神经网络计算词向量的上下文特征,并将其作为语境特征,使用TF-IDF算法计算出的词频权重作为语义特征,结合语境和语义特征扩充特征词含义。最后使用带有兴趣权重分配的GSDMM模型学习特征向量权重矩阵,实现用户兴趣标签生成和用户不同喜好程度影响下的兴趣权重修正。实验结果表明,该方法能够比较完备准确地表征用户画像,优于单一的主题构建方法,并且在聚类效果上表现出色。通过构建完备的用户画像,能够精准把握用户痛点,为后续个性化推荐提供服务。Aiming at the problem of how to quickly extract accurate user interests from massive short video data,user data and interactive data,a three-level label user portrait construction method based on topic model is proposed.Based onthe topic construction method,the video topic words obtained by the fused LDA and GSDMM topic models are used as user interest expression vectors.Firstly,an LDA filter is built to eliminate the topic-independent text information by comparing the threshold,so as to reduce the scale of the text and reduce the influence of non-main corpus on the generation of interest expression vector.Then,the construction method of the feature word weight matrix combining semantic information and context information is proposed.The Bi-GRU neural network is used to calculate the context feature of the word vector as the context feature,and the word frequency weight calculated by the TF-IDF algorithm is used as the semantic feature.Combining context and semantic features to expand the meaning of feature words.Finally,the GSDMM model with interest weight distribution is used to learn the feature vector weight matrix,and the user interest tag generation and the interest weight correction under the influence of different user preferences are realized.Experiments show that this method can represent user portraits more completely and accurately,which is better than single topic construction method,and performs well in clustering effect.By constructing a complete user portrait,the user’s pain points could be accurately grasp,so as to provide services for subsequent personalized recommendation.

关 键 词:短视频 用户画像 主题分析模型 语义权重 语境权重 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象