一种基于类不平衡学习的情感分析方法  被引量:4

A Sentiment Analysis Method Based on Class Imbalance Learning

在线阅读下载全文

作  者:李芳[1,2] 曲豫宾 陈翔 李龙[1] 杨帆[5] LI Fang;QU Yubin;CHEN Xiang;LI Long;YANG Fan(Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,Guangxi Zhuang Autonomous Region,China;School of Civil Engineering,Jiangsu College of Engineering and Technology,Nantong 226001,Jiangsu Province,China;School of Information Engineering,Jiangsu College of Engineering and Technology,Nantong 226001,Jiangsu Province,China;School of Information Science and Technology,Nantong University,Nantong 226019,Jiangsu Province,China;Center of Library and Information,Jiangsu College of Engineering and Technology,Nantong 226001,Jiangsu Province,China)

机构地区:[1]桂林电子科技大学广西可信软件重点实验室,广西桂林541004 [2]江苏工程职业技术学院建筑工程学院,江苏南通226001 [3]江苏工程职业技术学院信息工程学院,江苏南通226001 [4]南通大学信息科学技术学院,江苏南通226019 [5]江苏工程职业技术学院图文信息中心,江苏南通226001

出  处:《吉林大学学报(理学版)》2021年第4期929-935,共7页Journal of Jilin University:Science Edition

基  金:国家自然科学基金青年科学基金(批准号:61202006);广西可信软件重点实验室研究项目(批准号:kx202013);江苏工程职业技术学院科研计划项目(批准号:GYKY/2019/9);江苏高校“青蓝工程”项目和江苏高校境外研修计划项目。

摘  要:针对网络评论中普遍存在的负面评论较少而影响力却较大的类不平衡问题,提出一种基于类不平衡学习的情感分析方法.该方法利用深度学习训练过程中的概率输出,以计算样例的信息熵作为影响因子构建交叉信息熵损失函数.在IMDB公开数据集上进行实验验证的结果表明,基于集成信息熵损失函数的双向长短期记忆网络能处理类不平衡问题;对数据的统计分析结果表明,该策略能提升基于双向长短期记忆网络的评论情感极性分类性能.针对AUC(area under curve)指标,使用集成信息熵损失函数的双向长短期记忆网络模型比未考虑类不平衡的深度学习模型在中位数上最多提升15.3%.Aiming at the problem that class imbalance generally existed less negative comments but more influence in the network comments,we proposed a sentiment analysis method based on class imbalance learning.This method used the probability output in the process of deep learning training to calculate the information entropy of the sample.The information entropy was used as the influence factor to construct the cross information entropy loss function.The experimental results on the IMDB public dataset show that the bidirectional long short-term memory network based on the integrated information entropy loss function can deal with class imbalance problem.The statistical analysis of the data shows that this strategy can improve performance of sentiment polarity classification based on the bidirectional long short-term memory network.For the AUC(area under curve)indicator,the median of bidirectional long short-term memory network model with the integrated information entropy loss function is 15.3%higer than that of the deep learning model that does not consider class imbalance.

关 键 词:文本分类 长短期记忆网络 类不平衡 交叉熵损失函数 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象