检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]常州工学院计算机信息工程学院,江苏常州213002 [2]常州工学院常州市软件技术研究与应用重点实验室,江苏常州213002
出 处:《南京理工大学学报》2014年第6期733-738,749,共7页Journal of Nanjing University of Science and Technology
基 金:常州工学院校级科研基金项目(YN1316;YN1203)
摘 要:针对基于机器学习的中文微博情感分析方法存在处理过程复杂、判断准确率低等问题,该文提出了一种新的情感分析方法。将微博爬虫和Web应用程序编程接口(API)相结合,对动态微博数据进行收集和预处理。基于NTUSD和How Net中文情感词典的微博情感词的抽取和分类,计算词语语义相似度和倾向性。综合考虑表情、文本情感倾向的加权和正面情感增强等因素。实验结果表明:表情情感倾向对微博情感倾向起着重要作用;在表情和文本情感倾向比值固定的情况下,调整因素和中性区间的选择会对情感倾向判断准确率产生影响;通过与基于How Net语义相似度的计算模型比较,该文方法使得情感倾向判断准确率提高约5%。Aiming at the problems of complex treatment works and low accuracy of the sentiment analysis method of Chinese microblogging based on is proposed here. The dynamic microblogging data crawlers and Web application programming machine-learning, a new sentiment analysis method are collected and pretreated by combining Weibo interface(API). The semantic similarity and tendentiousness are calculated based on the extraction and classification of microblogging emotional words of Chinese sentiment word dictionaries NTUSD and HowNet. The weightings of expression and text emotional tendentiousness, the increase of positive emotion and other factors are considered. Experimental data show that:expression tendentiousness plays a vital role on microblogging emotional tendentiousness; the reasonable setting of adjustment factors and neutral thresholds can improve the accuracy of sentiment analysis better when the ratio of expression and text emotional tendentiousness is fixed; compared with the calculation model based on HowNet semantic similarity, the adjustment accuracy of emotional tendentiousness of the sentiment analysis method proposed here is improved by about 5%.
关 键 词:文本语义 表情倾向 微博 情感分析 机器学习 微博爬虫 应用程序编程接口 情感词典 语义相似度
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145