检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈洁[1] 李帅 赵姝[1] 张燕平[1] CHEN Jie;LI Shuai;ZHAO Shu;ZHANG Yanping(School of Computer Science and Technology,Anhui University,Hefei 230601,China)
机构地区:[1]安徽大学计算机科学与技术学院,合肥230601
出 处:《计算机科学与探索》2023年第12期3020-3028,共9页Journal of Frontiers of Computer Science and Technology
基 金:国家自然科学基金(61876001);国家社会科学基金重大项目(18ZDA032);安徽省高等学校自然科学基金(KJ2021A0039)。
摘 要:文本数据在情感分类时往往会出现一些较难分类的模糊数据,这些模糊数据因其不确定性在模型训练时易出现过拟合现象,影响模型的鲁棒性。三支决策理论将初始样本划分为确定域和不确定域,模糊数据所在的不确定域如何选取合适特征表示以便下游任务,是目前三支决策情感分析模型面临的挑战。针对此挑战,提出一个基于三支决策不确定域特征表示的鲁棒性情感分析模型(UFR-SA)。首先,基于三支决策理论划分确定域和不确定域,针对不确定域中的模糊样本,定义异类样本点对,构造多粒度特征表示。其次,设计多特征融合模型,将多粒度特征表示送入多层感知网络,以融合各粒度特征优势。最后,对于确定域和不确定域的测试样本采用分而治之的策略,确定域数据用原始特征表示,不确定域中的模糊数据用融合后的鲁棒性特征表示。在SST-2、SST-5以及CR数据集上的实验结果表明,UFR-SA有效降低了模糊数据对模型的干扰,优于目前最好的模型性能。In the sentiment classification of text data,there are often some fuzzy data that are difficult to classify.Due to their uncertainty,these fuzzy data appear to be over fitted during model training,which affects the robustness of the model.The three-way decision theories divide the initial sample into deterministic domains and uncertain domains,and how to select appropriate features for representation in the uncertain domain where the fuzzy data is located for downstream tasks is the challenge of the three-way decision sentiment analysis models.To address this challenge,a robust sentiment analysis model(UFR-SA)based on feature representation of three-way decision uncertainty domains is proposed.Firstly,based on the three-way decision theory,the deterministic domain and the uncertain domain are divided.For fuzzy samples in the uncertain domain,heterogeneous sample point pairs are defined to construct hierarchical features.Secondly,a hierarchical feature fusion model is designed to incorporate the advantages of each granularity feature into a multi-layer perceptual network.Finally,a divide and conquer BERTstrategy is adopted for test samples in the deterministic domain and the uncertain domain.The deterministic domain data are represented by the original features,and the fuzzy data in the uncertain domain are represented by the fused robust features.Experimental results on SST-2,SST-5,and CR datasets show that UFR-SA effectively reduces the interference of fuzzy data on the model and outperforms the performance of state-of-the-art models.
关 键 词:情感分析 三支决策 鲁棒性 多粒度特征表示 特征融合
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.31