在线评论情感分析中固定搭配特征提取方法研究  被引量:26

Regular Collocation Features Extraction Method in Online Reviews Sentiment Analysis

在线阅读下载全文

作  者:王祖辉[1] 姜维[1] 李一军[1] 

机构地区:[1]哈尔滨工业大学信息管理与信息系统研究所,黑龙江哈尔滨150001

出  处:《管理工程学报》2014年第4期180-186,共7页Journal of Industrial Engineering and Engineering Management

基  金:国家自然科学基金资助项目(71202168;71271066);中央高校基本科研业务费专项资金资助项目(HIT.NSRIF2010083);黑龙江省教育厅科学技术研究资助项目(12511435)

摘  要:有效和稳定的特征提取和特征表示是提高在线评论情感分析性能的重要因素。在常规的连续词袋性、触发对等特征的基础上,本文研究在线评论中固定搭配特征的提取与表示方法,提出结合互信息和平均互信息、基于粗糙集两种策略用于固定搭配特征提取,并从特征抽取方法的有效性和稳定性分析出发考查所抽取的固定搭配其内部及外部稳定性,并将经筛选的固定搭配特征融合于多种情感分析模型中进行情感分析。真实酒店评论数据上的实验表明,固定搭配特征的恰当表示和筛选有效改善情感分析模型的分类精度,此外研究发现评论中情感特征词分布不均衡情况下采用可变精度粗规则的提取策略有助于提高情感分析的分类精度。Precise sentiment orientation classification models and the extraction of effective and stable features from the review context are two essential factors which can affect the pedormance of online review sentiment analysis.Among various complicated features due to language complexity,regular collocation features are found to play important roles in that their structured expressions and show great impact on the sentiment orientation aside from conventional word bag and trigger pair features.In order to extract the complicated features for online reviews sentiment analysis,two novel approaches are presented in this paper to capture effectively the regular collocation features from the review of corpora-mutual information and average mutual information combined.Regular collocation features extracted are incorporated into sentiment analysis models as inputs to implementing the review sentiment analysis.The experiment on real hotel online reviews achieve generally higher precision,improves the performance of SVM models by 0.34% and that of the Na'fve Bayes models by 1.27%,respectively.As for the extraction of regular collocation features,two aspects were considered as essential to expressing effectively the complicated constraint of the review sentiment orientation from (1) internal stability of the regular collocation structure,which accounts for the substantial existence of the regular collocation aside from traditional word bags or trigger pairs,and (2) external effectiveness of the regular collocations which accounts for the contribution to the sentiment orientation classification.The mutual information method used in this paper measures external effectiveness while the average mutual information computation and its filtering performs the measurement of internal stability of regular collocations.The rough set based method ensures the internal stability and external effectiveness by α approximation rough rule extraction strategy and a maximum likelihood estimate of the regular collocations distribution.On

关 键 词:情感分析 固定搭配特征提取 互信息与平均互信息 粗糙集 支持向量机 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象