基于隐狄利克雷分配的情感分析  被引量:1

Sentiment analysis based on latent Dirichlet allocation

在线阅读下载全文

作  者:王建芳[1] 刘峰[1] 

机构地区:[1]南阳师范学院计算机与信息技术学院,河南南阳473061

出  处:《计算机工程与设计》2014年第6期2179-2182,2213,共5页Computer Engineering and Design

基  金:河南省科技厅科技攻关基金项目(122102210483;102102210465);河南省重大科技攻关基金项目(122102110274)

摘  要:提出了一种用于中文情感分析的词典资源生成方法,在此基础上,提出了一种基于LDA模型的情感分析算法LDASA。采用了一种自动翻译的方法将已有的英文情感词典翻译为中文;迭代地纠正在上步中翻译的错误,生成基于主题的情感词集合。使用分类算法,对文本表达的情感进行分类。在电子商务网站上抓取的宾馆、手机以及电子相机的数据集上的实验结果表明,该文所提的算法优于使用一元语法特征的支持向量机分类算法,平均的情感识别准确率提高了10百分点。A novel approach for generation of a lexical resource named ChineseClues used for sentiment analysis in Chinese was proposed.Based on the resource,a novel unsupervised LDA-based sentiment analysis method called LDASA was proposed.Firstly,an automatic translation approach was used to translate the existing English clues to Chinese.And then,an iterative refinement approach was used to correct the incorrect clues,topic-based polar sets were obtained from these clues,finally,each document was categorized into its related polarity using a classification algorithm.To evaluate this method,three resources on hotels,cell phones and digital cameras have been manually gathered from the e-shopping websites.The experimental results on these resources showed that,the method proposed performed better than support vector machine with unigram and achieved an improvement of 10% on average in polarity classification accuracy.

关 键 词:情感分析 中文情感词典 隐狄利克雷分配 主题词 分类算法 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象