检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨新怡 肖利雪 YANG Xinyi;XIAO Lixue(School of Computer Science,Xi'an University of Posts and Telecommunications,Xi'an 710121)
出 处:《计算机与数字工程》2020年第11期2560-2563,共4页Computer & Digital Engineering
摘 要:信息增益(IG)是通过某个特征词的缺失与存在两种情况下,语料中前后信息的增加,衡量某个特征词的重要性,其只考虑到对整体贡献,易忽略局部影响;卡方统计(CHI)是利用统计学的"假设检验"的基本思想:首先假设特征词与类别直接是不相关的,其易忽略低频词对文本影响。通过融合IG和CHI两种特征选择算法,并在此基础上对情感词的特征值附加权值区别于非情感词,基于该算法采用支持向量机(SVM)分类算法对文本数据进行情感倾向性分类,实验结果表明,该方式可以极大提高情感分本分类。Information gain(IG)is the measurement of the importance of a feature word by the increase and loss of informa⁃tion in the corpus through the absence and existence of a certain feature word.It only considers the contribution to the whole and eas⁃ily ignores the local influence.The chi-square statistics(CHI)is the basic idea of using the“hypothesis test”of statistics:firstly,it is assumed that the feature words are directly unrelated to the categories,and it is easy to ignore the influence of low-frequency words on the text.By combining the two features of IG and CHI Based on the algorithm,the additional weights of the eigenvalues of the sentiment words are distinguished from the non-sentimental words.Based on this algorithm,the support vector machine(SVM)classification algorithm is used to classify text data for emotional orientation.Experimental results show that this method can greatly improve the classification of emotional classification.
关 键 词:信息增益(IG) 卡方统计(CHI) 情感文本 支持向量机(SVM)
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38