基于KMeans-EDA算法的非均衡评论情感分类研究  

Research on non-balanced sentiment classification based on KMeans-EDA algorithm

在线阅读下载全文

作  者:郭卡[1] GUO Ka(School of Information and Mathematics,Anhui Foreign Languages University,Hefei 231200,China)

机构地区:[1]安徽外国语学院信息与数学学院,安徽合肥231200

出  处:《山东理工大学学报(自然科学版)》2024年第4期45-52,共8页Journal of Shandong University of Technology:Natural Science Edition

基  金:安徽省高等学校自然科学研究项目(KJ2020A0818);安徽外国语学院科研重点项目(AWky2020012)。

摘  要:学习者真实的评价是反映在线课程优缺点的重要指标,快速准确地获得其反馈,对于在线课程的优化极为重要。为深入挖掘学习者的在线学习行为,继而为在线教学提供有效的数据基础,爬取了中国大学MOOC平台的课程评论文本,基于Bert模型的结构,建立了基于自注意力文本表征的机器学习模型,能够实现对评论文本的精确情感分类,从而获得学习者内隐的情感状态。由于爬取数据中负面评论较少,故设计了KMeans-EDA自适应均衡采样训练策略,解决了训练过程中模型偏向多数类的问题,提升了模型对负面评论的识别能力。实验结果表明,该策略可以将模型对评论文本的F1-score值从0.6902提升到0.7399。Real evaluation from learners is an important indicator to reflect the advantages and disadvantages of online courses,so it is very important to obtain learners′feedback quickly and accurately for the optimization of online courses.To dig deeper into the online earning behavior of learners,which in turn provide an effective data basis for online teaching,this study crawls the comment text from Chinese University MOOC platform,set up a machine learning model based on self-attention pretrained Bert model,and perform accurate emotion classification of comment text so as to obtain the implicit emotional state of learners.Because there are few negative comments in the training data,an adaptive balanced resampling training strategy named KMeans-EDA was designed to solve the problem of model bias towards the majority class during training,which improves the model′s ability to identify negative comments.The experiment shows that this strategy can increase the model F1-score of the comment text from 0.6902 to 0.7399.

关 键 词:在线课程 评论文本 文本情感分类 预训练特征表示 非均衡训练 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] TP391[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象