检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周末 宋玉蓉[1,2] 宋波[1] 苏晓萍 ZHOU Mo;SONG Yurong;SONG Bo;SU Xiaoping(School of Computer Science and Software,Nanjing University of Posts and Telecommunications,Nanjing 210023,Jiangsu,China;School of Automation and Artificial Intelligence,Nanjing University of Posts and Telecommunications,Nanjing 210023,Jiangsu,China;School of Computer and Software Engineering,Nanjing Institute of Industry Technology,Nanjing 210046,Jiangsu,China)
机构地区:[1]南京邮电大学计算机学院,江苏南京210023 [2]南京邮电大学自动化学院,江苏南京210023 [3]南京工业职业技术大学计算机与软件学院,江苏南京210046
出 处:《微电子学与计算机》2021年第12期8-16,共9页Microelectronics & Computer
基 金:国家自然科学基金(61672298);江苏高校哲学社会科学研究重点项目(2018SIZDI142);教育部人文社会科学研究规划基金(17YJAZH071)。
摘 要:针对传统循环神经网络(RNN)建模时压力过大且容易忽略局部细节特征以及卷积神经网络(CNN)无法捕获远距离依赖关系的问题,提出了一种基于中断信息机制的文本分类模型方法.该方法将中断信息流的思想引入双向门控循环单元(BGRU)中,既能提取上下文远距离依赖关系又具有类似卷积核的位置不变性,从而兼顾到文本的时间特征及空间特征.在此基础上融合了自注意力机制,进一步学习特征之间的依赖关系,为重要特征分配较大权值以降低噪声冗余,强化模型对关键信息的提取能力,实现文本特征的优化操作.在AGnews,DBPedia.Yelp P.等5个真实数据集上进行实验,该方法的准确率较多个基线算法均有提升,分别达到了95.8%、99.7%、98.1%、70.4%、77.5%,验证了该模型能够更有效的实现文本分类,具有良好的应用前景.Aiming at the problems that the traditional recurrent neural network(RNN)modeling is too stressful and it is easy to ignore the local details and the convolutional neural network(CNN)cannot capture the long-distance dependencies,a text classification model method based on disconnected information flow is proposed.This method introduces the disconnected information flow into the bidirectional gated recurrent unit(BGRU),which can extract the long-distance dependence of the context and has the feature position invariance similar to the convolution kernel,thus taking into account the temporal and spatial characteristics of the text.On this basis,the self-attention mechanism is integrated to further learn the dependencies between features,assign larger weights to important features to reduce noise redundancy,strengthen the model's ability to extract key information,and realize the optimization of text features.Experiments on five real data sets including AGnews,DBPedia,Yelp P.,etc.,the accuracy of this method is higher than that of multiple baseline algorithms,reaching 95.8%,99.7%,98.1%,70.4%,77.5%respectively.It is verified that the model can realize text categorization more effectively and has good application prospects.
关 键 词:文本分类 中断信息 自注意力机制 循环神经网络 卷积神经网络
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222