基于递归神经网络的文本分类研究  被引量:41

Application of recurrent neural networks in text classification

在线阅读下载全文

作  者:黄磊[1] 杜昌顺[1] HUANG Lei DU ChangShun(School of Economies and Management, Beijing Jiaotong University, Beijing 100044, China)

机构地区:[1]北京交通大学经济管理学院,北京100044

出  处:《北京化工大学学报(自然科学版)》2017年第1期98-104,共7页Journal of Beijing University of Chemical Technology(Natural Science Edition)

摘  要:使用基于长短项记忆(LSTM)和门阀递归单元(GRU)计算节点的双向递归神经网络提取文本特征,然后使用softmax对文本特征进行分类。这种基于深度学习的神经网络模型以词向量作为基本输入单元,充分考虑了单词的语义和语法信息,并且在神经网络的计算过程中严格遵守单词之间的顺序,保留原文本中语义组合的方式,可以克服传统文本分类方法的不足。使用本文所提方法在第三届自然语言处理和中文计算会议(NLPCC 2014)公布的新华社新闻分类语料和路透社RCV1-v2语料上进行实验,其分类F1值分别达到了88.3%和50.5%,相较于传统的基线模型有显著的提升。由于该方法不需要人工设计特征,因此具有很好的可移植性。Text classification is one of the important tasks in machine learning. It requires that a computer can classify texts automatically given a classification model. This task can help human to manage text and mine useful information. The growth of text data on the internet both requires the design of proper algorithms to extract key features and classify the texts,and that the algorithms can be used on a computer. The traditional methods regard words as symbols and do not consider their combinations. This article use LSTM and GRU bidirectional recurrent neural Networks to extract text features,and uses softmax to classify them. The model considers the meaning of words and grammatical structure,and it preserves the combination semantics among the words in a text. Therefore,the new proposed method can overcome the shortcomings of traditional models. We conducted experiments on two news classification datasets published by NLPCC2014 and Reuters. The proposed model achieves F-value of 88. 3% and50. 5%,respestively,with the two datasets. The experimental results show that our method outperforms all the traditional baseline systems. In addition,our model does not need any human input and can be used with a wide range of texts.

关 键 词:文本分类 深度学习 长短项记忆(LSTM) 门阀递归单元(GRU) 双向递归神经网络 词向量 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象