面向半监督情感分类的特征选择方法研究  被引量:2

Feature Selection Method for Semi-Supervised Sentiment Classification

在线阅读下载全文

作  者:王志昊[1,2] 王中卿[1,2] 李寿山[1,2] 李培峰 施寒潇[1,2] 

机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215006 [2]浙江工商大学计算机与信息工程学院,浙江杭州310018

出  处:《中文信息学报》2013年第6期96-102,共7页Journal of Chinese Information Processing

基  金:国家自然科学基金资助项目(61070123,61003155);中科院自动化所模式识别国家重点实验室开放课题资助项目;教育部人文社会科学研究青年基金资助项目(12YJC630170);浙江省自然科学基金资助项目(LY13F020007,Z1110551)

摘  要:特征选择旨在降低高维度特征空间,进而简化问题和优化学习方法。已有的研究显示特征提取方法能够有效降低监督学习的情感分类中的特征维度空间。同以往研究不一样的是,该文首次探讨半监督情感分类中的特征提取方法,提出一种基于二部图的特征选择方法。该方法首先借助二部图模型来表述文档与单词间的关系;然后,结合小规模标注样本的标签信息和二部图模型,利用标签传播(LP)算法计算每个特征的情感概率;最后,按照特征的情感概率进行排序进而实现特征选择。多个领域的实验结果表明,在半监督情感分类任务中,基于二部图的特征选择方法明显优于随机特征选择,在保证分类效果不下降(甚至提高)的前提下有效降低了特征空间维度。Feature selection aims to reduce the high dimensional feature space so as to simplify the problem and im- prove the learning method. Existing studies have shown that feature selection is effective in reducing feature space in sentiment classification. In this paper, we focus on feature selection method. Different from all previous studies, we attempt to conduct the research on feature selection on semi-supervised sentiment classification. We propose a novel feature selection method based on bipartite graph which focuses on semi-supervised sentiment classification. First, we formulate the relations between documents and words with the help of bipartite graph model. Then, with a small amount of labeled data and the bipartite graph, a label propagation algorithm is applied to calculate the feature prob- abilities belonging to sentimental categories. Third, the features are then selected according the sentimental probabilities. The experimental results across multiple domains demonstrate that our feature selection method achieves much better performances than random feature selection method. Our approach is capable of significantly reducing the dimension of the feature vector without any loss in the classification performance.

关 键 词:情感分类 半监督学习 二部图 标签传播 特征选择 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象