检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《计算机应用》2013年第12期3563-3566,共4页journal of Computer Applications
基 金:国家自然科学基金资助项目(61103114)
摘 要:针对微博上存在的大量垃圾评论,提出一种基于AdaBoost的微博垃圾评论识别方法。该方法首先提取表示微博评论的特征值向量,由8个特征值组成,然后通过AdaBoost算法在这些特征上训练出若干个比随机预测好的弱分类器,最后将得到的弱分类器加权集合成高精度的强分类器。从实际的热门新浪微博中提取评论数据集进行实验,结果表明所选取的8个特征是有效的,该方法对于微博垃圾评论的识别拥有较高的识别率。In view of the existence of a lot of spam comments in microblog, a new method based on AdaBoost was proposed to identify spam comments. This method firstly extracted feature vectors which consisted of eight feature values to represent the comments, then trained several weak classifiers which were better than random prediction on these features via AdaBoost algorithm, and finally combined these weighted weak classifiers to build a strong classifier with a high precision. The experimental results on comment data sets extracted from the popular Sina microblogs indicate that the selected eight features are effective for the method, and it has a high recognition rate in the identification of spam comments in microblog.
关 键 词:微博 垃圾评论识别 特征值向量 ADABOOST算法 弱分类器
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.117

