检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:Galimkair Mutanov Vladislav Karyukin Zhanl Mamykova
机构地区:[1]Al-Farabi Kazakh National University,Almaty,050040,Kazakhstan
出 处:《Computers, Materials & Continua》2021年第10期913-930,共18页计算机、材料和连续体(英文)
摘 要:The volume of social media data on the Internet is constantly growing.This has created a substantial research field for data analysts.The diversity of articles,posts,and comments on news websites and social networks astonishes imagination.Nevertheless,most researchers focus on posts on Twitter that have a specific format and length restriction.The majority of them are written in the English language.As relatively few works have paid attention to sentiment analysis in the Russian and Kazakh languages,this article thoroughly analyzes news posts in the Kazakhstan media space.The amassed datasets include texts labeled according to three sentiment classes:positive,negative,and neutral.The datasets are highly imbalanced,with a significant predominance of the positive class.Three resampling techniques(undersampling,oversampling,and synthetic minority oversampling(SMOTE))are used to resample the datasets to deal with this issue.Subsequently,the texts are vectorized with the TF-IDF metric and classified with seven machine learning(ML)algorithms:naïve Bayes,support vector machine,logistic regression,k-nearest neighbors,decision tree,random forest,and XGBoost.Experimental results reveal that oversampling and SMOTE with logistic regression,decision tree,and random forest achieve the best classification scores.These models are effectively employed in the developed social analytics platform.
关 键 词:Social media sentiment analysis imbalanced classes machine learning OVERSAMPLING UNDERSAMPLING SMOTE RUSSIAN KAZAKH
分 类 号:TP1[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.198