检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:陈志[1] 郭武[1] CHEN Zhi;GUO Wu(National Engineering Laboratory for Speech and Language Information Processing,University of Science and Technology of China,Hefei 230027,China)
机构地区:[1]中国科学技术大学语音及语言信息处理国家工程实验室
出 处:《小型微型计算机系统》2020年第1期1-5,共5页Journal of Chinese Computer Systems
基 金:国家重点研发计划专项资助项目(2016YFB1001303)资助
摘 要:近几年来,随着词向量和各种神经网络模型在自然语言处理上的成功应用,基于神经网络的文本分类方法开始成为研究主流.但是当不同类别的训练数据不均衡时,训练得到的神经网络模型会由多数类所主导,分类结果往往倾向多数类,极大彩响了分类效果.针对这种情况,本文在卷积神经网络训练过程中,损失函数引入类别标签权重,强化少数类对模型参数的影响.在复旦大学文本分类数据集上进行测试,实验表明本文提出的方法相比于基线系统宏平均F1值提高了4.49%,较好地解决数据不平衡分类问题.In recent years,with the successful application of word embedding and neural networks in natural language processing,the neural networks algorithms have become the mainstream on text classification applications.However,when the training data is unbalanced,the parameters of neural network are usually dominated by those classes with more samples,and the classification accuracies will deteriorate greatly.In this paper,we modify the loss function with different weight according to the label information in the convolutional neural network training,which can reinforce the influence of minority classes on model parameters.The experiments are carried out on the fudan text classification corpus,and the proposed method can obtain a 4.49%macro Fl improvement over the baseline system.The experimental results have verified the effectiveness of proposed method on unbalanced corpora.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.74