基于多特征融合的微博细粒度情感分析  被引量:4

Micro-Blog Fine-Grained Sentiment Analysis Based on Multi-Feature Fusion

在线阅读下载全文

作  者:吴旭旭 陈鹏[1] 江欢 Wu Xuxu;Chen Peng;Jiang Huan(School of Information and Cyber Security,People’s Public Security University of China,Beijing 100045,China;School of E-Business and Logistics,Beijing Technology and Business University,Beijing 100048,China)

机构地区:[1]中国人民公安大学信息网络安全学院,北京100045 [2]北京工商大学电商与物流学院,北京100048

出  处:《数据分析与知识发现》2023年第12期102-113,共12页Data Analysis and Knowledge Discovery

基  金:中国人民公安大学基本科研业务费项目(项目编号:2022JKF02018)的研究成果之一。

摘  要:[目的]针对现有微博情感分析模型在微博文本相关特征提取和内容情感信息挖掘中存在的不足,提出RB-LCM模型以提升微博文本的细粒度情感分析效果。[方法]首先,采用RoBERTa动态编码微博文本字句特征;随后,利用Bi-LSTM与胶囊网络捕获微博语句更深层次的全局特征与局部特征;在此基础上,利用多头自注意力特征融合的方式对微博语句的相关多维度特征进行有效融合。训练过程采用改进的Focal Loss与FGM解决数据集标签不平衡以及模型的鲁棒性等问题。[结果]RB-LCM模型在SMP2020-EWECT数据集、NLPCC2013任务2数据集、NLPCC2014任务1数据集上的准确率与F1值分别为80.64%和77.41%、67.17%和51.08%、71.27%和58.25%,在二分类情感数据集weibo_senti_100k上的准确率与F1值则分别达到98.45%和98.44%,其表现均优于各数据集上先进的情感分析模型。[局限]进行情感分析时只结合文本信息,尚未涉及相关图片、视频、语音等信息。[结论]本文提出的RB-LCM模型能够有效提升微博细粒度情感分析效果。[Objective]This paper proposes an RB-LCM model to improve the fine-grained sentiment analysis of Weibo texts.[Methods]First,we used the RoBERTa to encode the character and sentence-level features of Weibo posts.Then,we utilized the Bi-LSTM and capsule network to capture in-depth global and local features of Weibo sentences.Third,we deployed multi-head self-attention feature fusion to fuse the relevant multi-dimensional features.Finally,we used improved Focal Loss and FGM to train the model and improve the dataset labels’imbalance and the model’s robustness.[Results]The accuracy and F1 value of the proposed model on the SMP2020-EWECT dataset reached 80.64%and 77.41%.The model’s accuracy and F1 value on the NLPCC2013 task 2 dataset were 67.17%and 51.08%.The model’s accuracy and F1 value on the NLPCC2014 task 1 dataset reached 71.27%and 58.25%.The model’s accuracy and F1 value on the binary sentiment dataset weibo_senti_100k dataset were up to 98.45%and 98.44%,respectively.All results were better than the advanced sentiment analysis models on each dataset.[Limitations]Our model did not include relevant pictures,videos,voice,or other information for sentiment analysis.[Conclusions]The proposed model can effectively analyze the sentiment of Weibo posts.

关 键 词:RoBERTa 多头自注意力融合 双向长短时记忆网络 微博情感分析 胶囊网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] G350[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象