检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐闽樟 陈羽中[1,2] XU Min-zhang;CHEN Yu-zhong(College of Mathematics and Computer Sciences,Fuzhou University,Fuzhou 350116,China;Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing,Fuzhou 350116,China)
机构地区:[1]福州大学数学与计算机科学学院,福州350116 [2]福建省网络计算与智能信息处理重点实验室,福州350116
出 处:《小型微型计算机系统》2021年第11期2292-2299,共8页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61672158,61672159,61502104,61502105)资助;福建省高校产学合作项目(2018H6010)资助;福建省科技引导项目(2017H001)资助;福建省自然科学基金项目(2017J01752,2018J01795)资助.
摘 要:随着信息时代的飞速发展,由此也衍生出刷垃圾评论等黑色产业.随着机器学习技术的兴起,人们研究出许多有效的方法来识别垃圾评论.传统统计机器学习方法通过人工特征工程提取能够区分垃圾评论和正常评论的评论类别特征,往往需要花费大量的精力进行特征选择;而深度学习方法利用神经网络自动学习评论特征.但是受限于标记数据的获取困难,现有的深度学习模型仍然存在较为严重的过拟合问题,另外不考虑主题信息,直接对评论文本进行训练也使得模型学习困难,泛化能力较弱.针对上述问题,本文提出一种用于垃圾评论分类的融合主题信息的生成对抗网络模型Topic-SpamGAN(Topic-SpamGenerative Adversarial Network).为解决标记样本获取困难的问题,Topic-SpamGAN采用GAN拟合真实标记样本,提升分类器的训练效果;其次,Topic-SpamGAN使用强化学习帮助生成器训练,改善生成样本的质量;此外,Topic-SpamGAN在模型学习中引入主题信息增强生成文本的相关性,并通过主题信息引导模型进行分类学习,使模型学习更为稳定.旅馆数据集上的实验结果证明,Topic-SpamGAN能获得优于现有垃圾评论分类模型的性能.With the rapid development of the information age,black industries such as spamming comments have also emerged from this.With the rise of machine learning technology,people have developed many effective methods to identify spam comments.Traditional statistical machine learning methods use artificial feature engineering to extract comment category features that can distinguish spam comments from normal comments,which often requires a lot of effort in feature selection;while deep learning methods use neural networks to automatically learn comment features.However,due to the difficulty in obtaining labeled data,the existing deep learning models still have serious overfitting problems.In addition,regardless of the topic information,training the review text directly makes the model learning difficult and the generalization ability is weak.To solve the above problems,this paper proposes a Topic-SpamGAN(Topic-Spam Generative Adversarial Network)model for spam comment classification,which integrates topic information.In order to solve the problem of difficulty in obtaining labeled samples,Topic-SpamGAN uses GAN to fit real labeled samples to improve the training effect of the classifier;secondly,Topic-SpamGAN uses reinforcement learning to help the generator train and improve the quality of generated samples;in addition,Topic-SpamGAN introduces topic information in the model learning to enhance the relevance of the generated text,and guides the model to classify learning through the topic information,making the model learning more stable.The experimental results on the hotel dataset prove that Topic-SpamGAN can achieve better performance than existing spam comment classification models.
关 键 词:垃圾评论分类 生成对抗神经网络 主题分类 半监督学习 强化学习
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.59.186