基于词性标注序列特征提取的微博情感分类  被引量:7

Emotion classification with feature extraction based on part of speech tagging sequences in micro blog

在线阅读下载全文

作  者:卢伟胜[1] 郭躬德[1] 陈黎飞[1] 

机构地区:[1]福建师范大学数学与计算机科学学院,福州350007

出  处:《计算机应用》2014年第10期2869-2873,共5页journal of Computer Applications

基  金:国家自然科学基金资助项目(61175123)

摘  要:传统的n-gram文本特征提取方法会产生高维度的特征向量,高维数据不但增大了分类的难度,同时也会增加分类的时间。针对这一问题,提出了一种基于词性(POS)标注序列的特征提取方法,根据词性序列能够代表一类文本的这一个特点,利用词性序列组作为文本的特征以达到降低特征维度的效果。在实验中,词性序列特征提取方法比n-gram特征提取方法至少提高了9%的分类精度,降低4816个维度。实验结果表明,该方法能够适用于微博情感分类。Traditional n-gram feature extraction tends to produce a high-dimensional feature vector. High-dimensional data not only increases the difficulty of classification, but also increases the classification time. Aiming at this problem, this paper presented a feature extraction method based on Part-of-Speech (POS) tagging sequences. The principle of this method was to use POS sequences as text features to reduce feature dimension, according to the property that POS sequences can represent a kind of text. In the experiment, compared with the n-gram feature extraction, the feature extraction based on POS sequences at least improved the classification accuracy of 9% and reduced the dimension of 4 816. The experimental results show that the method is suitable for emotion classification in micro blog.

关 键 词:特征提取 词性 标注序列 微博情感分类 极性分类 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象