检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡熠[1] 陆汝占[1] 李学宁[1] 段建勇[1] 陈玉泉[1]
机构地区:[1]上海交通大学计算机科学与工程系,上海200240
出 处:《计算机研究与发展》2007年第9期1469-1475,共7页Journal of Computer Research and Development
基 金:国家自然科学基金重大项目(60496326)
摘 要:提出了一种基于语言建模的文本情感分类的方法.将文本的情感倾向标记为"赞扬"或"批评",可以为文本提供主题之外的语义信息.为此提出了从训练数据中分别估计出代表"赞扬"和"批评"两种情感倾向的语言模型,然后通过比较测试文本自身的语言模型和这两种训练好的情感模型之间的Kull-back-Leibler距离,分类测试文本的思路.各个模型的参数分别选用词形特征的unigram和bigram,而相应的参数估计也分别尝试了最大似然和平滑两种策略.当在电影评论语料上和代表不同分类模型的支持向量机及朴素贝叶斯分类器进行比较时,语言建模的方法表现出了较好的分类性能和鲁棒性.Presented in this paper is a language modeling approach to the sentiment classification of text. It provides the semantic information beyond topic in text summary when characterizing the semantic orientation of texts as "thumb up" or "thumb down". The motivation is simple: "thumb up" and "thumb down" language models are likely to be substantially different: they prefer to different language habits. This divergence is exploited in the language models to effectively classify test documents. Therefore, the method can be deployed in two stages: firstly, the two sentiment language models are estimated from training data; secondly, tests are done through comparing the Kullback-Leibler divergence between the language model estimated from test document and those two trained sentiment models. The unigrams and bigrams of words are employed as the model parameters, and correspondingly maximum likelihood estimation and smoothing techniques are used to estimate these parameters. Compared with two different classifiers, i.e. SVMs and Naive Bayes, on movie review corpus when training data is limited, the language modeling approach performs better than SVMs and Naive Bayes classifier, and on the other hand it shows its robustness in sentiment classification. Future works may focus on finding a good way to estimate better language models, especially the higher order n-gram models and more powerful smoothing methods.
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145