基于关联词的主题模型语义标注  被引量:3

Semantic tagging of a topic model based on associated words

在线阅读下载全文

作  者:周亦鹏[1] 杜军平[2] 

机构地区:[1]北京工商大学计算机与信息工程学院,北京100048 [2]北京邮电大学智能通信软件与多媒体北京市重点实验室,北京100876

出  处:《智能系统学报》2012年第4期327-332,共6页CAAI Transactions on Intelligent Systems

基  金:国家"973"计划资助项目(2012CB821206);国家自然科学基金资助项目(91024001;61070142);北京市自然科学基金资助项目(4111002)

摘  要:互联网主题分析中经常采用概率主题模型对主题进行描述,但存在对于一般用户难以理解的问题,提出一种概率主题模型的自动语义标注方法.首先通过基于语义分类的关联规则挖掘关联主题词并建立候选标签集合,然后以关联词在数据集中的概率分布来设计相关性判别函数,计算候选标签和主题模型的相关度,最后根据最大边缘相关选择高语义覆盖度和区分度的标签.在食品安全和旅游领域主题模型标注的实验表明,与最大概率主题词标记方法相比,提出的方法能够明显提高标注的准确性,并且解决了多标签标记中语义类别单一的问题,能够以较少数量的标签表达更为丰富的语义,这有助于进一步实现更为准确的主题跟踪和主题信息检索.In topic analysis field of Internet,the probabilistic topic model is often used to describe topic semanteme.But the semanteme of a topic model is difficult for users to understand.An automatic semantic tagging method of a probabilistic topic model is proposed.Firstly,an association rule mining algorithm based on semantic categories is presented to get associated topic words,which consist of a candidate tag set.Then,according to the probability of associated words,a semantic correlation function is used to calculate semantic correlation of candidate tags and topic model.At last,a maximal marginal relevance method is used to select tags with better semantic coverage and discrimination.The experimental results of food safety and tourism topic model proved that,compared with maximum probability topic words tagging method,the proposed method can improve accuracy of topic tagging obviously,and can express more abundant semantemes with a small number of tags,which solve the problem of single semantic category in the multi-tagging method.So it is helpful to achieve more accurate topic tracking and topic information retrieval.

关 键 词:主题分析 语义标注 生成模型 关联词 关联规则 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象