检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈红阳[1] 汪林林[1] 陈滢生[1] 鲁江坤 左雪[1] CHEN Hongyang WANG Linlin CHEN Yingsheng LU Jiangkun ZUO Xue(College of Computer Engineering, Chongqing College of Humanities Science and Technology, Chongqing 401524, China)
机构地区:[1]重庆人文科技学院计算机工程学院,重庆401524
出 处:《电信科学》2017年第10期134-140,共7页Telecommunications Science
基 金:重庆市教委科技计划项目(No.KJ1601601);重庆市重点产业共性关键技术创新专项项目(No.cstc2015zdcy-ztzx40007);国家自然科学基金资助项目(No.61173184)~~
摘 要:微博短文本中存在一些相同或相近、但与主题关系不大的词项,对准确度量文本之间的相似性具有较大的干扰作用,影响微博话题被发现的质量。提出一种基于文本内容与结构化信息相结合的特征词选择算法,能有效提取具有代表性的特征词,并对文本、话题间相似度的计算策略进行改进,然后将特征词选择算法与相似度计算方法融合,应用于微博文本数据实现话题发现。实验结果表明,本算法能有效降低话题发现的平均漏检率与误检率,提高话题发现质量。Some words existing in micro-blog short text have a bad effect on the accuracy of text similarity calcula- tion, further affecting the quality of topic discovery. And these words are the same in shape or semantic meaning, but remote from the topic. A novel method of feature words selection based on micro-blog short text content and struc- tured information was proposed, which could effectively choose some important feature words from the text. Moreo- ver, in computing the similarity between texts, an improvement on computing the similarity between the text and the topic was made. Finally, the methods were combined together and applied to discover micro-blog topics. Experimen- tal results show that the new method of topic discovery can effectively reduce the average missing rate and false de- tection rate, and improve the quality of topic discovery.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229