LDA模型参数有偏估计方法  

Biased Parameter Estimation for LDA

在线阅读下载全文

作  者:袁伯秋[1] 周一民[1] 李林[1] 

机构地区:[1]北京航空航天大学计算机学院,北京100191

出  处:《小型微型计算机系统》2010年第4期647-650,共4页Journal of Chinese Computer Systems

基  金:国家"九七三"重点基础研究发展计划项目(2007CB310803)资助

摘  要:LDA(Latent Dirichlet Allocation)等基于隐含topic的模型在离散数据处理中的应用逐渐增多.然而LDA使用Dirichlet分布作为隐含topic的分布函数,未能很好表示各topic之间相互关系.目前常见改进方法是通过DAG(Directed Acyclic Graph)图或对数正态分布等其他分布函数表达topic之间的关系.本文通过参数有偏估计的方法,考虑topic混合过程中词项上的重叠关系,改变topic内部词项分布,最终改进LDA模型性能.在回顾一些基础内容后,重点介绍参数有偏估计及简化计算方法.最后通过LDA模型在信息检索中的实验验证这种改进的有效性,并初步分析模型参数选用规律.Latent Dirichlet Allocation(LDA)and other related topic models are increasingly popular tools for applications in discrete data.However,the Dirichlet distribution over latent topics in LDA did not capture the co-relation between topics very well.To overcome the drawback,directed acyclic graph(DAG)and other algebra distribution,such as logistic normal distribution,were introduced to describe the correlations between topics.They were effective but relatively expensive.We alleviated it by bias parameter estimation,which made the topics in LDA more independent than standard LDA.We introduced the background for biased estimation at first.Then we presented the detail method of the biased estimation within LDA frame.We report the result of the biased model in Ad hoc IR experiment showing that the biased estimation outperforms the basic EM method.Finally,the influence of some model parameters was analysis briefly.

关 键 词:topic模型 LDA 参数有偏估计 WORDNET 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象