基于混合概率潜在语义分析模型的Web聚类  被引量:2

Web clustering based on hybrid probabilistic latent semantic analysis model

在线阅读下载全文

作  者:王治和[1] 王凌云[2] 党辉[1] 潘丽娜[1] 

机构地区:[1]西北师范大学计算机科学与工程学院,兰州730070 [2]兰州银行科技部,兰州730030

出  处:《计算机应用》2012年第11期3018-3022,共5页journal of Computer Applications

摘  要:在电子商务应用中,为了更好地了解用户的内在特征,制定有效的营销策略,提出一种基于混合概率潜在语义分析(H-PLSA)模型的Web聚类算法。利用概率潜在语义分析(PLSA)技术分别对用户浏览数据、页面内容信息及内容增强型用户事务数据建立PLSA模型,通过对数—似然函数对三个PLSA模型进行合并得到用户聚类的H-PLSA模型和页面聚类的H-PLSA模型。聚类分析中以潜在主题与用户、页面以及站点之间的条件概率作为相似度计算依据,聚类算法采用基于距离的k-medoids算法。设计并构建了H-PLSA模型,在该模型上对Web聚类算法进行验证,表明该算法是可行的。In E-commerce,in order to know more about the inherent characteristics of user access and make better marketing strategies,a Web clustering algorithm based on Hybrid Probabilistic Latent Semantic Analysis(H-PLSA) model was proposed in this paper.The Probabilistic Latent Semantic Analysis(PLSA) models were established respectively on user browsing data,page information and enhanced user transaction data by using PLSA technology.Using log-likelihood function,three PLSA models were merged to get the user clustering H-PLSA model and the page clustering H-PLSA model.Similarity calculation was based on the conditional probability among latent themes and user,page as well as site in the clustering analysis.The k-medoids algorithm based on distance was adopted in this clustering algorithm.The H-PLSA model was designed and constructed in this article,and the Web clustering algorithm was verified on this H-PLSA model.Then it is proved that the algorithm is effective.

关 键 词:WEB聚类 概率潜在语义分析 潜在主题 k-medoids算法 

分 类 号:TP393.092[自动化与计算机技术—计算机应用技术] TP311.13[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象