面向虚假新闻检测的社交媒体多模态数据集构建  被引量:4

Construction of multi-modal social media dataset for fake news detection

在线阅读下载全文

作  者:高国鹏 房耀东 韩彦芳[1] 钱振兴 秦川[1] GAO Guopeng;FANG Yaodong;HAN Yanfang;QIAN Zhenxing;QIN Chuan(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;School of Computer Science,Fudan University,Shanghai 200433,China)

机构地区:[1]上海理工大学光电信息与计算机工程学院,上海200093 [2]复旦大学计算机科学技术学院,上海200433

出  处:《网络与信息安全学报》2023年第4期144-154,共11页Chinese Journal of Network and Information Security

基  金:国家自然科学基金(U20B2051,62172280);上海市自然科学基金(21ZR1444600)。

摘  要:社交媒体的出现正在改变着人们的生活,通过社交媒体可以便捷地获取和分享新闻,但同时助力了虚假新闻的滋生和传播,从而严重影响社会安全和稳定。因此,虚假新闻检测引起了研究者广泛关注。尽管存在多种基于深度学习的解决方案,但这些方法需要大量的数据作为支撑。现有的虚假新闻数据集,尤其是中文数据集不仅稀缺,而且数据集中的新闻大多属于同一个类别。为了更好地检测虚假新闻,构建了一个新的多模态的虚假新闻数据集(MFND,multi-modal fake news dataset),其中包含政治、经济、娱乐、体育、国际、科技、军事、教育、健康和社会生活这10个类别的中文和英文新闻数据。对提出的虚假新闻数据集的词频和类别进行分析,并与现有的虚假新闻数据集在新闻数量、新闻类别、模态信息和新闻语种等方面进行了对比,结果显示MFND在类别信息和新闻语种方面表现突出。另外,利用现有的典型虚假新闻检测方法在MFND上进行训练和验证,实验结果表明,相较于现有主流的虚假新闻数据集,MFND可以为模型提供10%左右的性能提升。The advent of social media has brought about significant changes in people’s lives.While social media allows for easy access and sharing of news,it has also become a breeding ground for the dissemination of fake news,posing a serious threat to social security and stability.Consequently,researchers have shifted their focus towards fake news detection.Although several deep learning-based solutions have been proposed,these methods heavily rely on large amounts of supporting data.Currently,there is a scarcity of existing datasets,particularly in Chinese,and the collected news articles are often limited to the same category.To enhance the detection of fake news,a new multi-modal fake news dataset(MFND)was developed,which comprised Chinese and English news data from ten diverse categories:politics,economy,entertainment,sports,international affairs,technology,military,education,health,and social life.The word frequencies and categories of the proposed fake news dataset were analyzed and compared with existing fake news datasets in terms of number of news,news categories,modal information and news languages.The results of the comparison demonstrate that the MFND dataset excels in terms of category information and news languages.Moreover,training and validating existing typical fake news detection methods with MFND dataset,the experimental results show an improvement of approximately 10%in model performance compared to existing mainstream fake news datasets.

关 键 词:社交媒体 虚假新闻检测 多模态 多类别 数据集 

分 类 号:TP393[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象