基于词典与机器学习的藏文微博情感分析研究  被引量:4

Lexicon and Machine Learning Based Sentiment Analysis of Tibetan Microblogs

在线阅读下载全文

作  者:杨志 YANG Zhi(Qinghai University For Nationalities, Xining 810007)

机构地区:[1]青海民族大学计算机学院,青海西宁810000

出  处:《软件》2017年第11期46-48,94,共4页Software

基  金:青海民族大学校级理工科项目(2016XJQ06)

摘  要:随着互联网自媒体的兴起,越来越多的藏族人开始使用微博,并在其发表自己的观点和看法,与微博相关的藏文信息处理研究随之得到了学术层面的广泛关注。本文根据藏文微博的行文特征,提出了基于词典与机器学习算法多特征融合的藏文情感分类方法。在特征选择方面,运用藏汉情感词、表情符号等作为特征项。实验发现由于所构建的情感词典覆盖率不够髙导致分类效果不太理想。为了优化实验结果,本文引入了信息增益特征选择的措施,实验显示该措施完全较人工选择特征方法的分类结果有较大的提高。针对特定领域,实验证明融合后的分类效果有了一定程度的提升。With the development of Web2.0era,more and more Tibetans began to express their own opinions and views on microblog.The Tibetan information processing research related to Tibetan microblog has drawn wide attention from academic communities.According to the expression features of Tibetan micro-blogs,this paper puts forward a method of multi-feature sentiment analysis which based on three kinds of machine learning algorithms.In the aspect of feature selection,it used of emotional words,morphological sequences,emojis and other features.The experimental results indicate that the classification performance was not ideal due to the inadequate coverage of the emotional dictionary constructed.In order to address this problem,the information gain feature selection method is introduced in this paper,and the experiment shows that the method is better than the classification effect of artificial selection feature.In the field of film topic,it is found that the classifier effect of fusion is better than that of single classifier.

关 键 词:自然语言处理 情感分类 微博 机器学习 特征选取 特征项权重 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象