基于潜在语义与图结构的微博语义检索  被引量:4

Microblog Semantic Retrieval Based on Latent Semantic and Graph Structure

在线阅读下载全文

作  者:肖宝[1] 李璞[2,3] 胡娇娇[2] 蒋运承[2] 

机构地区:[1]钦州学院电子与信息工程学院,广西钦州535000 [2]华南师范大学计算机学院,广州510631 [3]郑州轻工业学院软件学院,郑州450000

出  处:《计算机工程》2017年第6期182-188,194,共8页Computer Engineering

基  金:国家自然科学基金(61272066);广西高校中青年教师基础能力提升项目(KY2016LX431);广州市科技计划项目(2014J4100031);钦州市科学研究与技术开发计划项目(20164407)

摘  要:微博文本短小、特征稀疏、与用户查询之间存在语义鸿沟的特点会降低语义检索效率。针对该问题,结合文本特征和知识库语义,构建基于潜在语义与图结构的语义检索模型。通过Tversky算法计算基于Hashtag的特征相关度;利用隐含狄利克雷分布算法对Wikipedia语料库训练主题模型,基于JSD距离计算映射到该模型的文本主题相关度;抽取DBpedia中实体及其网络关系连接图,使用SimRank算法计算图中实体间的相关度。综合以上3个结果得到最终相关度。通过短文本和长文本检索对Twitter子集进行实验,结果表明,与基于开放关联数据和图论的方法相比,该模型在评估指标MAP,P@30,R-Prec上分别提高了2.98%,6.40%,5.16%,具有较好的检索性能。The characteristics of microblog such as short text, sparse feature and the semantic gap between users' query may reduce semantic retrieval efficiency. Aiming at these problems, taking into account both text feature and semantic of knowledge base,a semantic retrieval model based on latent semantics and graph structure is proposed. Firstly, Tversky algorithm is employed to measure feature relatedness by taking Hashtag as feature;Secondly,a topic model is trained by Latent Dirichlet Allocation(LDA) for Wikipedia, and text topic relatedness mapped to this model is calculated by JSD; Finally,the connection graph of entity and its network relation are extracted in DBpedia. SimRank is employed to measure relatedness between two entities. The three types of relatednesses calculated in previous steps are used to compute a final relatedness. Twitter subsets for short and long queries are used in experiment. Experimental results show that, compared with the method based on linked open data and graph-based theory, the proposed model improves MAP,P@ 30,R-Prec by 2.98% ,6.40% ,5.16% respectively,which means that it has better retrieval perfermance.

关 键 词:微博 文本相关度 图结构 隐含狄利克雷分布 语义检索 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象