特征词选择与相似度融合的微博话题发现方法  

A method of micro-blog topic discovery based on feature words selection and text similarity

在线阅读下载全文

作  者:陈红阳[1] 汪林林[1] 陈滢生[1] 鲁江坤 左雪[1] CHEN Hongyang WANG Linlin CHEN Yingsheng LU Jiangkun ZUO Xue(College of Computer Engineering, Chongqing College of Humanities Science and Technology, Chongqing 401524, China)

机构地区:[1]重庆人文科技学院计算机工程学院,重庆401524

出  处:《电信科学》2017年第10期134-140,共7页Telecommunications Science

基  金:重庆市教委科技计划项目(No.KJ1601601);重庆市重点产业共性关键技术创新专项项目(No.cstc2015zdcy-ztzx40007);国家自然科学基金资助项目(No.61173184)~~

摘  要:微博短文本中存在一些相同或相近、但与主题关系不大的词项,对准确度量文本之间的相似性具有较大的干扰作用,影响微博话题被发现的质量。提出一种基于文本内容与结构化信息相结合的特征词选择算法,能有效提取具有代表性的特征词,并对文本、话题间相似度的计算策略进行改进,然后将特征词选择算法与相似度计算方法融合,应用于微博文本数据实现话题发现。实验结果表明,本算法能有效降低话题发现的平均漏检率与误检率,提高话题发现质量。Some words existing in micro-blog short text have a bad effect on the accuracy of text similarity calcula- tion, further affecting the quality of topic discovery. And these words are the same in shape or semantic meaning, but remote from the topic. A novel method of feature words selection based on micro-blog short text content and struc- tured information was proposed, which could effectively choose some important feature words from the text. Moreo- ver, in computing the similarity between texts, an improvement on computing the similarity between the text and the topic was made. Finally, the methods were combined together and applied to discover micro-blog topics. Experimen- tal results show that the new method of topic discovery can effectively reduce the average missing rate and false de- tection rate, and improve the quality of topic discovery.

关 键 词:微博 特征词 选择 相似度 话题发现 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象