检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄梦莹 张晓滨[1] HUANG Mengying;ZHANG Xiaobin(School of Computer Science, Xi′an Polytechnic University, Xi′an 710048,China)
机构地区:[1]西安工程大学计算机科学学院,陕西西安710048
出 处:《西安工程大学学报》2018年第6期713-717,共5页Journal of Xi’an Polytechnic University
基 金:陕西省自然科学基金(2015JQ5157)
摘 要:针对卡方统计量(CHI)忽略低频词对文本分类的影响以及信息增益(IG)只考虑对整体的贡献,忽略对局部影响的问题,通过分析CHI和IG特征选择算法,提出融合CHI和IG,适用于情感文本分类的文本特征选择算法(CHI-IG).该算法在CHI和IG 2种特征选择算法中增加了权值,集合这2种特征选择算法的优点,降低了2种方法不足带来的影响.并在此基础上对情感词的特征值附加权值区别于非情感词.基于该算法并采用随机森林(Random Forest)和支持向量机(SVM)分类方法对情感文本进行分类实验.结果表明,该方法能有效地提高情感文本的分类效率.In view of the problem that CHI does not take into account the influence of low-frequency words and IG only focuses on the local effects,the feature selection algorithm of CHI and information gain are analysed and an improved feature selection algorithm(CHI-IG)combining these two methods,suitable for the classification of sentiment text,is proposed.It combines the advantages of two feature selection algorithms and reduces the impact of the shortcomings of these two methods by adding weights between the two feature selection algorithms.In addition,on this basis,the affective words are distinguished from non-affective words by attaching weights to their values.The improved algorithm is applied to the field of sentiment text analysis by using the classification algorithm of random forest and the Support Vector Machine(SVM).Experimental results show that this method can effectively improve the classification efficiency of sentiment text.
关 键 词:卡方统计量(CHI) 信息增益 特征选择 情感文本 随机森林 支持向量机
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49