检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京航空航天大学计算机学院,北京100191
出 处:《小型微型计算机系统》2010年第4期647-650,共4页Journal of Chinese Computer Systems
基 金:国家"九七三"重点基础研究发展计划项目(2007CB310803)资助
摘 要:LDA(Latent Dirichlet Allocation)等基于隐含topic的模型在离散数据处理中的应用逐渐增多.然而LDA使用Dirichlet分布作为隐含topic的分布函数,未能很好表示各topic之间相互关系.目前常见改进方法是通过DAG(Directed Acyclic Graph)图或对数正态分布等其他分布函数表达topic之间的关系.本文通过参数有偏估计的方法,考虑topic混合过程中词项上的重叠关系,改变topic内部词项分布,最终改进LDA模型性能.在回顾一些基础内容后,重点介绍参数有偏估计及简化计算方法.最后通过LDA模型在信息检索中的实验验证这种改进的有效性,并初步分析模型参数选用规律.Latent Dirichlet Allocation(LDA)and other related topic models are increasingly popular tools for applications in discrete data.However,the Dirichlet distribution over latent topics in LDA did not capture the co-relation between topics very well.To overcome the drawback,directed acyclic graph(DAG)and other algebra distribution,such as logistic normal distribution,were introduced to describe the correlations between topics.They were effective but relatively expensive.We alleviated it by bias parameter estimation,which made the topics in LDA more independent than standard LDA.We introduced the background for biased estimation at first.Then we presented the detail method of the biased estimation within LDA frame.We report the result of the biased model in Ad hoc IR experiment showing that the biased estimation outperforms the basic EM method.Finally,the influence of some model parameters was analysis briefly.
关 键 词:topic模型 LDA 参数有偏估计 WORDNET
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.99.196