检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]重庆理工大学计算机科学与工程学院,重庆400054
出 处:《计算机工程与设计》2015年第11期3128-3133,共6页Computer Engineering and Design
基 金:国家自然科学基金项目(61173184);重庆市教委科技计划基金项目(KJ100821)
摘 要:针对微博短文本特征稀疏导致文本相似性度量不精确的问题,提出一种基于多视角的微博短文本相似度算法。根据词形相同与词义相近寻找微博短文本中的公共块,以公共块所含词项总数与公共块之间的组合顺序,构建基于公共块序列的语义相似度;利用微博短文本发布时间、转发与评论等信息来修正该语义相似度,形成新的微博短文本相似度算法,度量微博短文本之间的相似性;将新的微博短文本相似度算法融入Single-Pass聚类算法中以检测微博话题。实验结果表明,将该算法应用于微博话题检测时,能够有效降低话题检测的平均漏检率与误检率等,提高了话题检测的质量。For the inaccuracy problem of Micro-blog short text similarity calculation caused by sparse features,a method of Micro-blog short text similarity based on multiple views was proposed.Common blocks between short texts were found according to the same word in form or the similar word in meaning,and short text semantic similarity model based on common block sequence was newly established by combining the total number of words within common blocks with order between common blocks.The creating time of Micro-blog short texts and the structured information such as forwarding and commenting were used to revise short text semantic similarity model to construct a novel method of Micro-blog short text similarity,commonly measuring the similarity between Micro-blog short texts.The algorithm was combined with Single-Pass clustering algorithm to detect Microblog topics.Experimental results show that when applying the method into Micro-blog topic detection,the average missing rate and false detection rate of topic discovery were effectively reduced,improving the quality of topic discovery.
关 键 词:微博短文本相似度 微博话题检测 结构化信息 公共块序列 语义相似度
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.61