检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨志 YANG Zhi(Qinghai University For Nationalities, Xining 810007)
机构地区:[1]青海民族大学计算机学院,青海西宁810000
出 处:《软件》2017年第11期46-48,94,共4页Software
基 金:青海民族大学校级理工科项目(2016XJQ06)
摘 要:随着互联网自媒体的兴起,越来越多的藏族人开始使用微博,并在其发表自己的观点和看法,与微博相关的藏文信息处理研究随之得到了学术层面的广泛关注。本文根据藏文微博的行文特征,提出了基于词典与机器学习算法多特征融合的藏文情感分类方法。在特征选择方面,运用藏汉情感词、表情符号等作为特征项。实验发现由于所构建的情感词典覆盖率不够髙导致分类效果不太理想。为了优化实验结果,本文引入了信息增益特征选择的措施,实验显示该措施完全较人工选择特征方法的分类结果有较大的提高。针对特定领域,实验证明融合后的分类效果有了一定程度的提升。With the development of Web2.0era,more and more Tibetans began to express their own opinions and views on microblog.The Tibetan information processing research related to Tibetan microblog has drawn wide attention from academic communities.According to the expression features of Tibetan micro-blogs,this paper puts forward a method of multi-feature sentiment analysis which based on three kinds of machine learning algorithms.In the aspect of feature selection,it used of emotional words,morphological sequences,emojis and other features.The experimental results indicate that the classification performance was not ideal due to the inadequate coverage of the emotional dictionary constructed.In order to address this problem,the information gain feature selection method is introduced in this paper,and the experiment shows that the method is better than the classification effect of artificial selection feature.In the field of film topic,it is found that the classifier effect of fusion is better than that of single classifier.
关 键 词:自然语言处理 情感分类 微博 机器学习 特征选取 特征项权重
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145