检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:林江豪 顾也力[2] 周咏梅 阳爱民 陈锦[1] LIN Jianghao;GU Yeli;ZHOU Yongmei;YANG Aimin;CHEN Jin(Laboratory for Language Engineering and Computing,Guangdong University of Foreign Studies,Guangzhou 510006;Faculty of Asian Languages and Cultures,Guangdong University of Foreign Studies,Guangzhou 510420;School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510006)
机构地区:[1]广东外语外贸大学语言工程与计算实验室,广州510006 [2]广东外语外贸大学东方语言文化学院,广州510420 [3]广东外语外贸大学信息科学与技术学院,广州510006
出 处:《计算机与数字工程》2020年第6期1400-1404,1439,共6页Computer & Digital Engineering
基 金:国家自然科学基金项目(编号:61877013);教育部人文社会科学项目(编号:14YJA740011);广东省哲学社会科学“十二五”规划项目(编号:GD15YTS01);广州市哲学社会科学“十三五”规划2018年度课题(编号:2018GZQN27);广东省科技计划项目(编号:2017A040406025)资助。
摘 要:针对现有短文本细粒度情绪分类研究中的不足,提出基于概率潜在语义分析(PLSA)模型和K-means聚类的短文本细粒度情绪分类方法。基于PLSA计算获得语料集的文档与主题、词语与主题之间的概率矩阵;在词语与主题概率分布上,基于K-means算法对词汇在主题上的概率分布进行聚类,进而将相近主题进行合并处理;基于情感本体库,认为某一类情绪词汇出现的概率最高的主题与词汇的情绪类别相同,对合并后的主题进行情绪类别识别;最后,基于合并后的文档与主题概率矩阵,认为出现在某一主题概率最高的文档与主题的情绪类别相同,对文档情绪类别进行分类。实验结果表明,采用PLSA+K-means比PLSA可取得更高的分类准确率,总体分类准确率达到95.23%。In order to solve the shortcomings of the research on short text fine-grained emtion classification,based on the model of Probabilistic Latent Semantic Analysis(PLSA)and K-means,this paper proposes a method of fine-grained emotional classification for short texts.First of all,the"doc-topic"and"word-topic"probability matrixes are computed by PLSA model.Based on K-means algorithm and"word-topic"probability distribution,the probability distribution of words on the topic is clustered,and then the similar topics are merged.Then,drawing upon the"word-topic"together with the ontology lexicon,the emotional categories of the topics are discerned,with the presupposition that the emotional category of words is similar to those of words within the topic which occurs most frequently.Finally,the fine-grained emotional classification of short text is made via the"doc-topic",with the assumption that the emotional category of topics is equivalent to those of topics within the document which occurs most frequently.The experimental results show that PLSA+K-means can achieve higher classification accuracy than PLSA,and the accurate rate of the method proposed by this paper is reached 95.23%.
关 键 词:情绪分类 情绪类别 PLSA模型 K-MEANS
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249