基于词典与语料结合的中文微博主观句抽取方法  

Study on the extraction of Chinese microblog subjective sentences based on lexicon and corpus

在线阅读下载全文

作  者:朱海欢 余青松[1] 

机构地区:[1]华东师范大学计算中心,上海200062

出  处:《华东师范大学学报(自然科学版)》2014年第4期62-68,87,共8页Journal of East China Normal University(Natural Science)

摘  要:提出一种基于词典与语料结合的中文微博主观句抽取方法,通过判断句子中是否包含情感表达文本来判断句子是否为主观句.首先,从现有的情感词典中挑选出情感倾向较为固定的情感词构建了一个高可信情感词典,用于抽取句子中的情感表达文本,保证情感表达文本抽取的准确率;然后提出N-POSW模型,并基于2-POS W模型通过语料学习的方法较为准确地抽取句子中的剩余情感表达文本,保证了情感表达文本抽取的召回率.实验结果表明,相比于传统的基于大规模情感词典的方法,本文方法主观句抽取的F值提高了7%.In this paper, we propose a new method for the extraction of Chinese microblog subjective sentence, which is based on a combination of lexicon and corpus. By determining whether the sentence contains emotional expressions, it can be classified as a subjective or objective sentence. Firstly, a highly credible sentiment lexicon was built based on the words whose emotional orientation is fixed from the existing sentiment dictionary. Based on the highly credible sentiment lexicon, sentiment expressions can be extracted with assurance of accuracy. Finally, a N-POSW model was proposed for the corpus-based learning method. Through the 2-POSW model, the remained sentiment expressions in the sentence can be extracted, thus guaranteeing the overall recall rate. Experimental results show that the F Value in this paper increases 7% compared with the traditional method, which is based on the large-scale sentiment lexicon.

关 键 词:情感词典 高可信情感词典 N—POSW模型 主观句 

分 类 号:TP39[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象