融合CHI与信息增益的情感文本特征选择  被引量:3

The feature selection of sentiment text based on CHI and information gain

在线阅读下载全文

作  者:黄梦莹 张晓滨[1] HUANG Mengying;ZHANG Xiaobin(School of Computer Science, Xi′an Polytechnic University, Xi′an 710048,China)

机构地区:[1]西安工程大学计算机科学学院,陕西西安710048

出  处:《西安工程大学学报》2018年第6期713-717,共5页Journal of Xi’an Polytechnic University

基  金:陕西省自然科学基金(2015JQ5157)

摘  要:针对卡方统计量(CHI)忽略低频词对文本分类的影响以及信息增益(IG)只考虑对整体的贡献,忽略对局部影响的问题,通过分析CHI和IG特征选择算法,提出融合CHI和IG,适用于情感文本分类的文本特征选择算法(CHI-IG).该算法在CHI和IG 2种特征选择算法中增加了权值,集合这2种特征选择算法的优点,降低了2种方法不足带来的影响.并在此基础上对情感词的特征值附加权值区别于非情感词.基于该算法并采用随机森林(Random Forest)和支持向量机(SVM)分类方法对情感文本进行分类实验.结果表明,该方法能有效地提高情感文本的分类效率.In view of the problem that CHI does not take into account the influence of low-frequency words and IG only focuses on the local effects,the feature selection algorithm of CHI and information gain are analysed and an improved feature selection algorithm(CHI-IG)combining these two methods,suitable for the classification of sentiment text,is proposed.It combines the advantages of two feature selection algorithms and reduces the impact of the shortcomings of these two methods by adding weights between the two feature selection algorithms.In addition,on this basis,the affective words are distinguished from non-affective words by attaching weights to their values.The improved algorithm is applied to the field of sentiment text analysis by using the classification algorithm of random forest and the Support Vector Machine(SVM).Experimental results show that this method can effectively improve the classification efficiency of sentiment text.

关 键 词:卡方统计量(CHI) 信息增益 特征选择 情感文本 随机森林 支持向量机 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象