基于层次化的微博情绪分类——以新浪微博为例  被引量:1

Layering-based micro-blog emotion classification——Case study of Sina micro-blog

在线阅读下载全文

作  者:王向华[1] 宋欣[1] WANG Xiang-hua;SONG Xin(College of Electronic Information Engineering,Tianjin Vocational Institute,Tianjin 300410,China)

机构地区:[1]天津职业大学电子信息工程学院,天津300410

出  处:《计算机工程与设计》2018年第11期3431-3437,共7页Computer Engineering and Design

基  金:天津市基础研究计划基金项目(14JCTPJC00553);天津市高等学校科技发展基金计划基金项目(20130711)

摘  要:针对当前大多微博情绪分析算法难以准确描绘不同情绪差异的问题,对中文微博的情绪成分和层次化情绪分类进行研究。预处理消除非情绪信息,引入ICTCLAS分词工具包对文章进行分割,提取形容词、名词和动词等,形成特征,使用卡方测试、词频和点互信息(PMI)对特征进行选择,运用支持向量回归(SVR)和规则集进行分类。数据集采用新浪原始中文微博,不同分组之间的实验结果验证了该方法的有效性,其在多个层次上的F测度等值优于其它同类方法,随机挑选50篇微博进行评判,近一半的结果得到所有评判员的支持。Aiming at the problem that lots of current micro-blog sentiment analysis algorithms are difficult to accurately depict the different emotional differences,a study on hierarchical sentiment classification and emotional components of Chinese micro-blog articles was researched.Non-emotional information was eliminated in the pre-processing.The ICTCLAS word segmentation toolkit was introduced to segment the text,to extract adjectives,nouns and verbs,and form features.The x 2-test,word frequency and point of mutual information(PMI)were adopted to select features.Support vector regression(SVR)and rule sets were used for classification.In the experiment,Sina original Chinese micro-blog was used as data sets.The effectiveness of the proposed method is verified by the results of different groups.Compared with other similar methods,the proposed method is more accurate at multiple levels in the aspects of F measure.50 micro-blog are randomly selected to judge.Nearly half of the results are supported by all the judges.

关 键 词:微博 情绪分类 点互信息 情绪成分分析 支持向量回归 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象