基于集成神经网络的短文本分类模型  被引量:12

Short Text Classification Model Based on Integrated Neural Networks

在线阅读下载全文

作  者:高云龙[1,2] 左万利 王英[1,2] 王鑫[2,3] GAO Yunlong;ZUO Wanli;WANG Ying;WANG Xin(College of Computer Science and Technology,Jilin University,Changchun 130012,China;Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education,Changchun 130012,China;School of Computer Technology and Engineering,Changchun Institute of Technology,Changchun 130012,China)

机构地区:[1]吉林大学计算机科学与技术学院,长春130012 [2]吉林大学符号计算与知识工程教育部重点实验室,长春130012 [3]长春工程学院计算机技术与工程学院,长春130012

出  处:《吉林大学学报(理学版)》2018年第4期933-938,共6页Journal of Jilin University:Science Edition

基  金:国家自然科学基金(批准号:60903098;60973040);国家自然科学基金青年科学基金(批准号:61300148);吉林大学研究生创新项目(批准号:2016184)

摘  要:针对短文本具有稀疏性强和文本长度较小等特性,为更好地处理短文本分类问题,提出一个基于集成神经网络的短文本分类模型.首先,使用扩展词向量作为模型的输入,从而使数值词向量可有效描述短文本中形态、句法及语义特征;其次,利用递归神经网络(RNN)对短文本语义进行建模,捕获短文本内部结构的依赖关系;最后,在训练模型过程中,利用正则化项选取经验风险和模型复杂度同时最小的模型.通过对语料库进行短文本分类实验,验证了所提出模型有较好的分类效果,且该分类模型可处理变长的短文本输入,具有良好的鲁棒性.Aiming at the characteristics of sparseness and too limited words in one short text,in order to better deal with the problem of short text classification,we proposed a short text classification model based on integrated neural networks.Firstly,the extended word vector was used as the input of the model,so that the numerical word vector could effectively describe the morphological,syntactic and semantic features of short text.Secondly,the recurrent neural network(RNN)was used to model the semantics of short text,capture the dependency of internal structure of short text.Finally,we used the regularization term to select the model with minimal empirical risk and model complexity simultaneously in the process of training model.By the short text classification experiments on the corpus,we verified that the proposed model has a better classification effect,and the classification model could deal with short text input with variable length,and has a good robustness.

关 键 词:短文本 集成神经网络 扩展词向量 分类 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象