基于栈式降噪编码器的跨语言多标签情感分类  

Cross-language Multi-label Sentiment Classification Based on Stacked Denoising AutoEncoder

在线阅读下载全文

作  者:唐诗琪 周瑞平 谢仕斌 刘梦赤 肖文 TANG Shi-qi;ZHOU Rui-ping;XIE Shi-bin;LIU Meng-chi;XIAO Wen(Guangzhou Key Laboratory of Big Data and Intelligent Education,Guangzhou 510631,China;School of Computer Science,South China Normal University,Guangzhou 510631,China)

机构地区:[1]广州大数据智能教育重点实验室,广东广州510631 [2]华南师范大学计算机学院,广东广州510631

出  处:《计算机与现代化》2023年第11期6-12,共7页Computer and Modernization

基  金:国家自然科学基金资助项目(61672389);广州市大数据智能教育重点实验室项目(201905010009)。

摘  要:多标签情感分类任务旨在处理一个实例可能与多个情感标签关联的问题。现有的大多数多标签情感分类模型都是基于完整的数据设计,模型性能和语义易受到数据本身存在的不完全性影响。针对此问题本文提出一种基于栈式降噪自编码器的跨语言多标签情感分类模型,引入标签感知损失函数弥补训练带来的损失。该模型通过栈式降噪自编码器对词向量去噪以构建原始数据的低维特征,降低特征空间的噪声干扰,为下游任务提供有效特征表示。在SemEval2018的3种语言数据集(即英语、阿拉伯语和西班牙语)多标签情感分类实验中,该模型在测试集上的micro_F1、macro_F1、jaccard这3个指标均得到提升,其中macro_F1分别提升了约0.82、1.45和1.83个百分点。The multi-label sentiment classification task aims to deal with the problem that an instance may be associated with multiple sentiment labels.Most existing multi-label sentiment classification models were designed based on complete data,and their performance and sentiment were easily affected by the incompleteness of data itself.To address this problem,a cross-language multi-label sentiment classification model based on stacked denoising autoencoder is proposed,and a loss function is introduced to compensate for the loss caused by training.In this model,the word vectors are denoised by the stacked denoising autoencoder to construct the low-dimensional features of the original data.This reduces the noise interference in feature space and provides effective feature representation for downstream tasks.In the multi-label sentiment classification experiment of SemEval2018 three language datasets(English,Arabic and Spanish),the micro_F1 score,macro_F1 score and jaccard indexes of the model on the test set are all improved.Macro_F1 is improved by about 0.82,1.45 and 1.83 percentage points,respectively.

关 键 词:多标签分类 情感分类 不完全数据 BERT 栈式降噪自编码器 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象