基于信息增益与CHI卡方统计的情感文本特征选择  被引量:2

Emotional Text Feature Selection Based on Information Gain and CHI Chi-square Statistics

在线阅读下载全文

作  者:杨新怡 肖利雪 YANG Xinyi;XIAO Lixue(School of Computer Science,Xi'an University of Posts and Telecommunications,Xi'an 710121)

机构地区:[1]西安邮电大学计算机学院,西安710121

出  处:《计算机与数字工程》2020年第11期2560-2563,共4页Computer & Digital Engineering

摘  要:信息增益(IG)是通过某个特征词的缺失与存在两种情况下,语料中前后信息的增加,衡量某个特征词的重要性,其只考虑到对整体贡献,易忽略局部影响;卡方统计(CHI)是利用统计学的"假设检验"的基本思想:首先假设特征词与类别直接是不相关的,其易忽略低频词对文本影响。通过融合IG和CHI两种特征选择算法,并在此基础上对情感词的特征值附加权值区别于非情感词,基于该算法采用支持向量机(SVM)分类算法对文本数据进行情感倾向性分类,实验结果表明,该方式可以极大提高情感分本分类。Information gain(IG)is the measurement of the importance of a feature word by the increase and loss of informa⁃tion in the corpus through the absence and existence of a certain feature word.It only considers the contribution to the whole and eas⁃ily ignores the local influence.The chi-square statistics(CHI)is the basic idea of using the“hypothesis test”of statistics:firstly,it is assumed that the feature words are directly unrelated to the categories,and it is easy to ignore the influence of low-frequency words on the text.By combining the two features of IG and CHI Based on the algorithm,the additional weights of the eigenvalues of the sentiment words are distinguished from the non-sentimental words.Based on this algorithm,the support vector machine(SVM)classification algorithm is used to classify text data for emotional orientation.Experimental results show that this method can greatly improve the classification of emotional classification.

关 键 词:信息增益(IG) 卡方统计(CHI) 情感文本 支持向量机(SVM) 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象