基于数据增强和多任务学习的突发公共卫生事件谣言识别研究  被引量:4

Rumor Detection of Public Health Emergencies Based on Data Augmentation and Multi-Task Learning

在线阅读下载全文

作  者:曾子明[1] 张瑜 Zeng Ziming;Zhang Yu(School of Information Management,Wuhan University,Wuhan 430072,China)

机构地区:[1]武汉大学信息管理学院,武汉430072

出  处:《数据分析与知识发现》2023年第11期56-67,共12页Data Analysis and Knowledge Discovery

基  金:国家社会科学基金项目(项目编号:21BTQ046)的研究成果之一。

摘  要:【目的】通过引入多任务学习模型和数据增强方法,解决突发公共卫生事件情景下谣言识别任务数据不平衡且带标签数据量少的问题。【方法】首先提取突发公共卫生事件谣言文本特征构建替换词表,基于扩展同义词表构建CEDA方法对不平衡的谣言数据集进行增强,然后构建多任务学习模型融合突发公共卫生事件情感分类和谣言识别任务的领域信息,基于Transformer获取共享特征,通过BiLSTM模型获取谣言识别任务的独有特征,提升突发公共卫生事件谣言识别任务准确性。【结果】本文所提多任务学习模型的F1值达到0.972,比基于不平衡数据集的模型和单任务学习模型分别高出0.006和0.007,与DC-CNN模型相比F1值提升0.024。【局限】多任务学习模型的辅助任务仅包括情感二分类任务,需要对负面情感进行更细粒度的分类。【结论】基于领域数据增强和多任务学习的方法能够有效提高突发公共卫生事件谣言识别的分类效果。[Objective]This paper proposes a new model with data augmentation and multi-task learning,aiming to address the issue of unbalanced data and insufficient labeled data in rumor detection during public health emergencies.[Methods]Firstly,we extracted the text features of public health emergency rumors to construct a replacement word list.Then,we developed the CEDA method based on the extended synonym table to enhance the unbalanced rumor dataset.Third,we built a multi-task learning model to integrate the domain information of public health emergency sentiment classification and rumor detection.Fourth,we obtained the shared features with Transformer and retrieved the unique features of the rumor detection task using the BiLSTM model.Finally,it helped us improve the accuracy of the rumor detection.[Results]The F1 value of the proposed model was 0.972,which was 0.006 and 0.007 higher than the model based on the unbalanced dataset and the single-task learning model.Compared with the DC-CNN model,the F1 value increased by 0.024.[Limitations]The multi-task learning model only includes binary classification of sentiments,requiring more fine-grained negative sentiment classification.[Conclusions]The proposed method can effectively classify public health emergency rumors.

关 键 词:突发公共卫生事件 谣言识别 数据增强 多任务学习 共享Transformer 

分 类 号:TP393[自动化与计算机技术—计算机应用技术] G350[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象