检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:贺超波 汤庸[2] 张琼[3] 刘双印 刘海[2] HE Chao-bo;TANG Yong;ZHANG Qiong;LIU Shuang-yin;LIU Hai(School of Information Science and Technology,Zhongkai University of Agriculture and Engineering,Guangzhou,Guangdong 510225,China;School of Computer,South China Normal University,Guangzhou,Guangdong 510631,China;School of Data and Computer Science,Sun Yat-sen University,Guangzhou,Guangdong 510006,China)
机构地区:[1]仲恺农业工程学院信息科学与技术学院,广东广州510225 [2]华南师范大学计算机学院,广东广州510631 [3]中山大学数据科学与计算机学院,广东广州510006
出 处:《电子学报》2019年第5期1086-1093,共8页Acta Electronica Sinica
基 金:国家自然科学基金(No.61772211);广东省科技计划项目(No.2017A040405057;No.2017A030303074;No.2016A030303058);广州市科技计划项目(No.201807010043)
摘 要:对社会化媒体产生的大量短文本进行聚类分析具有重要的应用价值,但短文本往往具有噪音数据多、增长迅速且数据量大的特点,导致现有相关算法难于有效处理.提出一种基于增量式鲁棒非负矩阵分解的短文本在线聚类算法STOCIRNMF.STOCIRNMF基于非负矩阵分解构建短文本聚类模型,通过l_(2,1)范数设计模型的优化求解目标函数提高鲁棒性,同时应用增量式迭代更新规则实现短文本的在线聚类.在搜狐新闻标题和微博短文本数据集上进行相关实验,结果表明STOCIRNMF不仅比现有代表性算法具有更好的聚类性能,而且能够有效对微博话题进行在线检测.Clustering a large number of short texts in social media has great value in applications.However,short texts often have these characteristics:lots of noises,growing rapidly and massive data.Most existing short text clustering algorithms are not effectively enough to process such short texts.Aiming at this problem,we propose an algorithm of short text online clustering based on incremental robust nonnegative matrix factorization (STOCIRNMF).This algorithm uses NMF to build the short text clustering model and applies l 2,1 norm to devise its objective function for improving its robustness.Meanwhile,STOCIRNMF can cluster short texts incrementally by using incremental iterative update rules.We conduct extensive experiments on real Sohu news titles and Weibo datasets.The results show that STOCIRNMF not only has better performance of short text clustering than some representative algorithms,but also is very effective to detect micro blog′s topics online.
关 键 词:短文本聚类 鲁棒非负矩阵分解 在线聚类 l2 1范数 增量式迭代更新规则
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3