基于AdaBoost的微博垃圾评论识别方法  被引量:6

Identification method of spam comments in microblog based on AdaBoost

在线阅读下载全文

作  者:黄铃[1] 李学明[1] 

机构地区:[1]重庆大学计算机学院,重庆400044

出  处:《计算机应用》2013年第12期3563-3566,共4页journal of Computer Applications

基  金:国家自然科学基金资助项目(61103114)

摘  要:针对微博上存在的大量垃圾评论,提出一种基于AdaBoost的微博垃圾评论识别方法。该方法首先提取表示微博评论的特征值向量,由8个特征值组成,然后通过AdaBoost算法在这些特征上训练出若干个比随机预测好的弱分类器,最后将得到的弱分类器加权集合成高精度的强分类器。从实际的热门新浪微博中提取评论数据集进行实验,结果表明所选取的8个特征是有效的,该方法对于微博垃圾评论的识别拥有较高的识别率。In view of the existence of a lot of spam comments in microblog, a new method based on AdaBoost was proposed to identify spam comments. This method firstly extracted feature vectors which consisted of eight feature values to represent the comments, then trained several weak classifiers which were better than random prediction on these features via AdaBoost algorithm, and finally combined these weighted weak classifiers to build a strong classifier with a high precision. The experimental results on comment data sets extracted from the popular Sina microblogs indicate that the selected eight features are effective for the method, and it has a high recognition rate in the identification of spam comments in microblog.

关 键 词:微博 垃圾评论识别 特征值向量 ADABOOST算法 弱分类器 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象