基于重用检测的微博垃圾用户过滤算法被引量：8

Detecting microblog spammers based on reuse detection

出　　处：《南京大学学报（自然科学版）》2013年第4期456-464,共9页Journal of Nanjing University（Natural Science）

基　　金：江苏省自然科学基金重点项目(BK2011005);国家自然科学基金(61272221);江苏省社科基金(12YYA002)

摘　　要：针对微博中的反垃圾处理问题,本文提出了基于重用检测模型的垃圾用户检测算法,该方法综合考虑了消息序列中文本相关性和时间相关性,对垃圾用户的发布行为进行建模.按照文本粒度不同,基于重用检测模型的检测算法分为语句级检测(SRD)和词项级检测(TRD).SRD算法侧重于用户行为方式,而TRD算法侧重于垃圾消息的主题.基于真实数据集的实验表明,SRD算法在整体性能上优于TRD算法,但TRD算法具有更高的运行效率,并且检测针对性强,可发现指定类型的垃圾用户.最后,本文运用重用检测算法在垃圾用户群体检测方面做了初步尝试,实验表明基于转发关系的重用检测算法可以发现真实有效的垃圾群体用户.Tremendous increase of spam has become a serious problem.In this paper,we aim to detect microblog spammers by means of retweeting relationship.We introduce a new reuse detection model,which simultaneously incorporates text content and temporal information,to rate the intensity of spamming behaviours.We then present two spam detection algorithms based on such model.One is sentence-level detection algorithm,the other is term-level one.The sentence-level detection algorithm prefers the behaviour pattern of spammers and ignores the topic of spam messages.The term-level detection algorithm focuses the topic of spam messages and compensates for lack of sentence-level one.Finally,we evaluate our approaches on a real dataset collected from Sina microblog,the largest microblog in China.Extensive experiments show the effectiveness and efficiency of our algorithms.

关键词：垃圾消息微博重用检测

分类号：TP393.08[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于重用检测的微博垃圾用户过滤算法被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于重用检测的微博垃圾用户过滤算法 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于重用检测的微博垃圾用户过滤算法被引量：8