机构地区:[1]江西财经大学信息管理学院,南昌330013 [2]蒙特利尔大学计算机科学与运筹学系
出 处:《计算机学报》2018年第7期1574-1597,共24页Chinese Journal of Computers
基 金:国家自然科学基金(61762042;61363039;61562032);江西省落地计划项目(KJLD14035);江西省自然科学基金(20171BAB202021;20152ACB20003)资助~~
摘 要:情感或情绪分析在舆情分析、商品评论分析、商品推荐等领域应用广泛,而文本中的情感或情绪分析通常以情感词典为基础.人工情感词典虽然准确但构建代价大、难以及时更新,很难适应微博这类新情感词快速更迭的数据.微博平台为新情感词的发布和传播提供了便捷的途径,是新情感词的重要来源.考虑到已有规模较大的人工情感词典及大量包含新情感词的微博数据,在统计、分析、对比中、英两种语言微博中情感词分布差异的基础上,提出了与特定语言无关的基于分类思想的微博新情感词抽取方法cNSEm.cNSEm根据微博数据集和情感词典自动构建训练数据、训练分类器并判别候选词的情感极性,最后采用投票机制确定候选词的情感极性.通过大量而细致的实验,分析了cNSEm在中、英文两种语言的微博数据上的表现、六类特征的作用和用法以及抽取的新情感词对微博情感分类任务的帮助作用.实验结果表明,cNSEm比经典的基于共现和极性传播的方法要好,特别是当考虑中文微博数据集中的名词类情感词时.对cNSEm抽取的新情感词进行了直接和间接两种方法评测,前者利用人工情感词典作参照,后者考察抽取的新情感词对情感分类的帮助作用,从评测指标上看,cNSEm抽取的新情感词与人工情感词典的质量相当,并且cNSEm能适应有较大差异的中、英两个语种.Text sentiment analysis tries to get the orientation(attitude,point of view,or emotion)of information publishers,which is widely used in the field of public opinion supervision,product reviews analysis,et al.,and has become one of the hottest topics in natural language processing,social media processing,data mining,etc.Sentiment analysis or emotion analysis on text is always based on a sentiment dictionary.Manually-built sentiment dictionary may produces high accuracy however with limited coverage and updating difficulty,which is hard to cope with situation under Web 2.0,where new sentiment words are created more frequently and spread more quickly.Microblog platforms,such as Twitter and Sina Weibo,allow users to publish and transmit information freely,and become important sources of new sentiment words.By using large manually-built sentiment dictionaries and microblog data with mass sentiment words online,this paper analyzes distribution difference of Chinese and English sentiment words,and cNSEm is proposed to extract new sentiment words from microblogs,based on classification principle.cNSEm automatically generates candidate samples,which are classified by a trained classifier,and then sorted and extracted according to a voting strategy.The classification based methods have been used to extract new sentiment words in some related works.However,most of them extracted sentiment words from web pages,Wordnet,or product reviews,and candidate words are usually constrained on adjectives.cNSEm has to deal with not only the informal expression of microblogs but also the expanded POS candidates,especially when nouns are included.Based on some carefully designed experiments,we analyze the performance of cNSEm on both Chinese and English microblogs.We also analyze and compare the impacts of six categories of features used in cNSEm,including context,POS,language mode,modify relationship,sentence feature and co-occurrence with other sentiment words.Experimental results show that six categories of features employed by cNSEm p
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...