基于加权词向量和LSTM-CNN的微博文本分类研究  被引量:8

Research on Text Classification of Weibo Based on Weighted Word Vectors and LSTM-CNN

在线阅读下载全文

作  者:马远浩 曾卫明[1] 石玉虎[1] 徐鹏[1] MA Yuan-hao;ZENG Wei-ming;SHI Yu-hu;XU Peng(College of Information Engineering,Shanghai Maritime University,Shanghai 201306)

机构地区:[1]上海海事大学信息工程学院,上海201306

出  处:《现代计算机》2018年第17期18-22,共5页Modern Computer

摘  要:近年来,随着网络技术的不断发展,微博作为一种社交工具越来越受到人们的喜爱。由此在微博上产生大量的包含个人情感的文本信息,而这些带有个人情感的文本信息对网络舆论的传播产生巨大影响,所以对微博文本的分析研究变得十分紧迫。针对于此,提出一种LSTM与CNN的混合模型对文本分类。首先,基于Word2Vec的词向量训练方法以克服传统文本向量表示方法中高维度和高稀疏性的问题;进一步,通过TF-IDF模型对词向量进行加权赋值以确定词向量的重要程度;最后,以加权运算后的词向量作为初始输入样本来对LSTM与CNN混合模型进行分类训练,进而自动提取出文本信息中的隐含特征,实现对微博评论数据的准确分类。实验结果表明,该方法能够显著提高对微博文本内容的分类准确率,进而有效预测微博舆论的传播趋势。In recent years, with the continuous development of network technology, Weibo as a social tool, has become more and more popular. A large amount of textual information containing personal emotions on Weibo, and these textual messages with personal emotions have had a huge impact on the dissemination of online public opinion. Therefore, the analysis of Weibo texts has become very urgent. In view of this, propos- es a hybrid model of LSTM and CNN to classify texts. Firstly, the word vector training method based on Word2vec is used to overcome the problem of high dimensionality and high sparsity in the traditional text vector representation method; further, the word vector is weighted and assigned by the TF-IDF model to determine the degree of importance of the word vector; finally, the calculated word vector is used as the initial input sample to classify the LSTM and CNN hybrid models, and then the implicit features in the text information are automatical- ly extracted to accurately classify the microblog comment data. The experimental results show that this method can significantly improve the classification accuracy of Weibo text content, and then effectively predict the propagation trend of Weibo public opinion.

关 键 词:文本挖掘 Word2Vec 微博评论 情感分析 LSTM CNN 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象