基于改进LDA的社会化标签主题识别方法  

The Topic Recognition Method of Socialized Tags Based on Improved LDA

在线阅读下载全文

作  者:邰悦 葛斌[1] 李慧宗 TAI Yue;GE Bin;LI Huizong(School of Computer Science and Engineering,Anhui University of Science and Technology,Huainan Anhui,232001,China;School of Computer Science and Technology,Nanyang Normal University,Nanyang Henan,473061,China)

机构地区:[1]安徽理工大学计算机科学与工程学院,安徽淮南232001 [2]南阳师范学院计算机科学与技术学院,河南南阳473061

出  处:《安徽理工大学学报(自然科学版)》2021年第5期55-63,共9页Journal of Anhui University of Science and Technology:Natural Science

基  金:国家自然科学基金资助项目(51874003,61703005);教育部人文社会科学研究青年基金资助项目(13YJCZH077);安徽省自然科学基金资助项目(1808085MG221)。

摘  要:针对社会化标签中资源之间存在独立同分布特性,并且其对应的标签资源作为资源内容的特殊语义内容,提出一种联合特征词加权-LDA(Joint Feature Word Weighting-LDA)在资源内容和标签下联合主题识别方法,从而解决资源存在的独立同分布特性以及特征词采样等问题。首先建立评论及对应标签资源在信息熵相似度条件下的潜在关系,对该潜在关系使用随机游走方法获取各组资源和各组标签的权值系数,消除资源间的独立同分布。通过加权方法加权至每个资源的特征词,形成资源特征词和标签特征词的权重值系数。在此基础上构建联合特征词加权-LDA模型,通过迭代学习方法获取社会化标签资源的隐含主题知识。通过实验表明,提出的联合特征词加权-LDA相对于其他主题模型具有更好的主题识别效果。Aiming at the independent and identical distribution characteristics of resources in socialized tags,and the corresponding tag resources as the special semantic content of the resource content,a Joint Feature Word Weighting-LDA joint topic identification method under the resource content and tags is proposed to solve the problems of the independent and identical distribution characteristics and feature word sampling existingintheresources.Firstly,the potential relationship between comments and corresponding tag resources under the condition of information entropy similarity was establishedto obtain the weight coefficients of each group of resources and each group of tagsand eliminate the independent and identical distribution among theresourcesbyusing the random walk method.The feature words of each resource were weighted by a weighting method to form the weight value coefficients of the resource feature words and the tag feature words,basedonwhich the Joint Feature Word Weighting-LDA model was constructed and the implicit topic knowledge of social tags resources obtained through iterative learning methods.Experiments show that the Joint Feature Word Weighting-LDA proposed in this paper has a better topic recognition effect than other topic models.

关 键 词:社会化标签 信息熵相似度 独立同分布 加权方法 潜在狄利克雷分布(LDA) 

分 类 号:TP399[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象